Google intros helpful TensorFlow Recorder, but warns you’ll need to cough up for ‘huge datasets’

AI/ML

By Julia Schmidt

August 11, 2020

Google intros helpful TensorFlow Recorder, but warns you’ll need to cough up for ‘huge datasets’

Just a couple of days after pushing out the latest TensorFlow release, Google has open sourced a new tool for the machine learning framework aiming to push the record format forward.

TensorFlow Recorder is available at GitHub under the Apache License 2.0 and is meant to help creating TFRecords from “images and labels in Pandas DataFrames or CSV files”.

According to Google Cloud AI engineers Mike Bernico and Carlos Ezequiel, the project has become necessary in a computer vision context, where data loading can take quite a while when not formatted properly. As a consequence, resources aren’t used as efficiently as they could be, making an already time-consuming process even lengthier.

When using TensorFlow to build models for these kinds of use cases, the project’s record format is one way to work around this bottleneck, since it can be combined with approaches like prefetching, which gets data for the next processing steps before it’s needed, and interleaving for parallel processing, to reduce latency.

To be able to get there, the raw data has to be converted, which requires some work not everyone is willing to put in. This is where Bernica and Ezequiel hope TensorFlow Recorder will come in, providing users with a comparatively easy way to go from image/label sets to TFRecords with only little additional code.

However, for now the tool will be most useful to those already familiar with Google’s portfolio, since Recorder expects the data to come in an image csv format similar to the one AutoML Vision prefers. The team “hopes” to extend format support in the future, but since the project is open source now, this feels more like a call to users to maybe do their bit to add Pandas DataFrame conversion to the mix.

Another caveat is the fact that its creators say that – as is – the project wouldn’t scale to “huge datasets” of millions of images. Since those datasets can indeed be necessary for more complex computer vision tasks, though, TensorFlow Recorder can be connected to Google Cloud Dataflow which should be better able to handle large amounts of data.

Of course having this option in place is very helpful, but it again pushes users into the direction of one of Google’s commercial offerings, which seem to become more and more present in the open source project as of late. Other examples for this development are the continuing focus on TPU integration for speed improvements and some packages making their way into Google Cloud Storage – something users should at least be aware of.

Sourcegraph coding assistant now supports Anthropic Claude 3 – though limited to 7K token input

Supabase moves out of beta, adds supports for Swift, plugs in Oriole storage engine

Go dev survey shows frustration with Python’s dominance of AI

AI coding: Hugging Face engineer extols benefits of open source models, but hard questions remain

.NET Smart Components experiment the "Visual Basic" of AI programming?

GitHub autofix progresses to public beta: insecure code corrected with AI, but only for enterprise

JetBrains bows to user pressure and unbundles AI Assistant in new IntelliJ IDEA beta

Hands On: Netlify AI-assisted deployment aims to reduce log-diving

Stack Overflow turns to Google for hosting and AI features, trusts in Gemini for tech answers

Employing your cloud data warehouse to scale up AI/ML

Rust-based Zed editor now open source – with built-in support for OpenAI and GitHub Copilot

AI assistance is leading to lower code quality, claim researchers

Google intros helpful TensorFlow Recorder, but warns you’ll need to cough up for ‘huge datasets’

ABOUT US

FOLLOW US