概念(1)SIMDsingle instruction multiple data起初在ARM处理器上的应用是基于通用寄存器的,且指令有限。之后,演变为Advanced SIMD extension,且在所有ARM Cortex-A系列处理器上均有实现。Neon与VFP若Neon与VFP同时支持,会共用同一组寄存器Neon指令功能:memory accessesdata copying between NEON and general purpose registersdata type conversiondata processing支持的数据类型支持8-bit,16-bit,32-bit,64-bit signed and unsigned integers。也支持32-bit single precision floating point elements,8-bit or 16-bit polynomialsNeon的寄存器Q0~Q15,128bitD0~D31, 63bitNeon指令均以V开头如VADD.I16 q0, q1, q2开发方法(1)直接写汇编(2)采用Neon C开发,内联函数#include编译命令中添加
-mfpu=neon
(3)automatic vectorization
(4)Using Neon optimized libraries
OpenMAX
Neon一次命令处理的数据个数是指定的,但是在实际应用中,经常会有数据不是处理个数的整数倍。通常的解决方案是对剩余元素单独处理
(1)Larger Arrays
要注意数据的初始化,防止影响最终结果
(2)Overlapping
某些数据处理两次
(3)Single Element processing
Neon中的对齐
Load and Store addresses must be aligned to cache lines to allow more efficient memory access.
Cortex-A8中为16个word
Avoid writing to the same area of memory, specifically the same cache line, from both ARM
and NEON code.
To obtain best performance from hand-written NEON code, it is necessary to be aware of some
underlying hardware features. In particular, the programmer should be aware of pipelining and
scheduling issues, memory access behavior and scheduling hazards.