TensorFlow Lite pulls throttle, adds speed as it puts OpenCL in the sidecar

TensorFlow Lite pulls throttle, adds speed as it puts OpenCL in the sidecar

The team behind the GPU inference engine of mobile deep learning framework TensorFlow Lite successfully finished experiments with an OpenCL-based flavour for Android, promising up to double the speed when compared to its OpenGL counterpart.

The alternative backend has been part of the TensorFlow repository for about a year already, so it has been tested quite a bit. However the engineers behind the addition have waited until now to officially launch the engine, making it a bit more visible to developers using TFLite on Android devices.

An inference engine is a component used to apply learned rules to extract new information, which is useful in environments where coming up with completely new rules isn’t feasible due to limited resources – as is often the case on mobile or embedded systems.

The idea to give OpenCL a go was motivated by the fact that the framework was designed with the use of various accelerators in mind. Meanwhile Open Graphics Library OpenGL, which is normally used in TensorFlow Lite when GPUs are roped into the inference process, only learned to work with certain components pretty late. As a consequence, its API has the burden of needing to stay backward compatible, which the team felt sometimes keeps them from getting the most of a device’s GPU.

Other than the easy accelerator use, OpenCL offers good profiling options which help to uncover potential for optimisation, supports 16-bit precision floating point, and comes with constant memory, which has proven efficient in certain layers of a neural network. Making use of these features in a OpenCL backend saw the inference engine working twice as fast as the usual OpenGL solution, especially on the Adreno GPU series developed at Qualcomm.

While this all sounds promising, the new backend has a major drawback: OpenCL isn’t part of the standard Android distribution and might therefore not be available to everyone. To work around that, the TFLite GPU delegate was fitted with a checking mechanism which employs OpenCL once found and falls back to OpenGL should the other framework not be available.