Lighting up OpenCV with Ne10 and NEON
Efficiency in image processing has always been of high importance, this is increasingly true when performed on embedded and mobile devices. A majority of mobile devices are using Advanced RISC Machines (ARM) processors but surprisingly single instruction multiple data (SIMD) optimisations for this architecture is not yet common. This is especially true when we consider open source frameworks and libraries running on these newer mobile ARM devices.
INTEL have recently used their Streaming SIMD Extensions 2 (SSE2) framework to improve efficiency of heavyweight functions within the Open Compute Vision (OpenCV) library. Unfortunately, the energy efficiency of INTEL processors means it is not the dominant architecture for mobile devices. Image processing on these devices is now common due to the increasing camera capabilities (where most modern smartphones contain an 8 Megapixel camera). We also see a majority of Android and iOS apps leveraging the OpenCV library to perform image processing. The OpenCV library has numerous computationally intensive operations where the use of SIMD is beneficial, INTEL has identified and remedied these bottlenecks using SSE2, however most mobile devices (which run ARM) will not benefit from it. This is of fundamental importance for a mobile architecture in which efficiency and battery life are the major concern.
An alternative to INTEL's SIMD instruction sets (SSE and AVX) are ARM's NEON intrinsic instruction set (released in 2009) and the Ne10 software framework (2012). NEON contains similar, but not identical, vector instructions to SSE and AVX. The Ne10 library provides a set of commonly used vector operations, with each vector operation function consisting of clusters of pre-rolled NEON intrinsics. Ne10 provides a higher level of abstraction, enabling C/C++ floating point arrays to be manipulated at a higher level, allowing for faster development time. But at what cost?
End-users are increasingly using their phones as media processing machines, but what can the app developer do to save the battery life of these devices? Using SIMD to improve computing efficiency which in turn reduced clock cycles is an attractive option!
During this session we will compare between SSE, NEON and Ne10 when considering SIMD optimisations to critical functions of the OpenCV library, we will also discuss speedup factors and ease of use (from a programmers perspective). We will also "briefly" examine auto-vectorisation and how different compilers stack-up. Finally we "might" have a demonstration of some simple image processing using our NEON-ised OpenCV framework for iOS and Android devices.
Beau is a PhD Candidate at the Australian National University and code-monkey at Open Parallel. He has an interest in OpenCL, OpenGL, OpenCV, creative naming conventions and image processing. He has developed for iOS, Android, OSX, Linux and a micro-CT machine. In his free time he likes to develop statistical imaging software for soil scientists.