TensorFlow look to get more frugal with resources ahead of 2.3 release


Data scientists and machine learning types get a last chance for input in the upcoming 2.3 release of machine learning framework TensorFlow with the release candidate, which is now available, showcasing new features to tackle bottlenecks and preprocess data.

The former is mostly realised through experimental snapshot and distribution mechanisms, which can now be found in TensorFlow’s data module. They allow users to persist the outputs of their preprocessing to use in consequent steps and produce data for parallel iterations over a data set, leading to lesser resource consumption and speedups.

If this doesn’t help, developers will also find a memory profiler and a tracer for Python code in TensorFlow 2.3. With those, investigating performance bottlenecks should become a bit easier, providing teams with at least some clues as to what they could investigate to speed up their code. 

Another way to improve performance could be the use of TPUs. Since those are, like TensorFlow itself, Google-bred, the corresponding APIs have received some special team attention and are now ready for production. The company connection is also apparent in the fact that Libtensorflow packages can now be found on Google Cloud Storage, with nightly versions available.

With that being said, the Keras team has also been busy improving their deep learning library, which is now solely developed under the TensorFlow umbrella. Amongst the enhancements is a much highlighted experimental preprocessing layers API. Complete with a replacement for the feature column API, the new addition offers functions for typical preprocessing steps such as text vectorisation, and data normalisation. 

Keras also comes with new utilities for the generation of image, text, and time series data sets and improvements to the image preprocessing and augmentation layers. Other new processing layers have been included to turn continuous numerical features into categorical ones, build indexes for the latter, and create new categorical features amongst other things.

While updating should be relatively straightforward, devs might need to check their code, since, for example, the DatasetBase::IsStateful method made way for DatasetBase::CheckExternalState, and subclasses of tf.data’s IteratorBase::RestoreInternal, IteratorBase::SaveInternal, and DatasetBase::CheckExternalState are expected to provide implementations now. Those using bazel to build TensorFlow will need to make sure they’re working with a version that is more recent than 3.1.