vload_half and vloada_half

Hi all
In OpenCL spec, there are 2 versions of this kind of build-in functions for half type. the only difference I found is that they have different requirement of alignment. does it mean that vloada_half() will have an higher performance? And what is the purpose of this differentiation. Thanks!

vload_halfn allow you to load a 1, 2, 4, 8 or 16 component half-vector where the alignment requirement is that p be aligned to a 16-bit i.e. size of a scalar half boundary.

vloada_halfn allow you to load a 1, 2, 4, 8 or 16 component half-vector where the alignment requirement is that p be aligned to the size of half vector. vloada_halfn should, in most cases, give you better memory access performance compared to the unaligned vload_halfn version.