Neon

概念（1）SIMDsingle instruction multiple data起初在ARM处理器上的应用是基于通用寄存器的，且指令有限。之后，演变为Advanced SIMD extension，且在所有ARM Cortex-A系列处理器上均有实现。Neon与VFP若Neon与VFP同时支持，会共用同一组寄存器Neon指令功能：memory accessesdata copying between NEON and general purpose registersdata type conversiondata processing支持的数据类型支持8-bit,16-bit，32-bit，64-bit signed and unsigned integers。也支持32-bit single precision floating point elements，8-bit or 16-bit polynomialsNeon的寄存器Q0~Q15,128bitD0~D31， 63bitNeon指令均以V开头如VADD.I16 q0, q1, q2开发方法（1）直接写汇编（2）采用Neon C开发，内联函数#include编译命令中添加

-mfpu=neon

（3）automatic vectorization

（4）Using Neon optimized libraries

OpenMAX

Neon一次命令处理的数据个数是指定的，但是在实际应用中，经常会有数据不是处理个数的整数倍。通常的解决方案是对剩余元素单独处理

（1）Larger Arrays

要注意数据的初始化，防止影响最终结果

（2）Overlapping

某些数据处理两次

（3）Single Element processing

Neon中的对齐

Load and Store addresses must be aligned to cache lines to allow more efficient memory access.

Cortex-A8中为16个word

Avoid writing to the same area of memory, specifically the same cache line, from both ARM

and NEON code.

To obtain best performance from hand-written NEON code, it is necessary to be aware of some

underlying hardware features. In particular, the programmer should be aware of pipelining and

scheduling issues, memory access behavior and scheduling hazards.

推荐阅读更多精彩内容