TensorFlow opens doors to privacy aware machine learning

TensorFlow opens doors to privacy aware machine learning

Two additions to the TensorFlow product family should help developers not only with hard to centralise data, but also keep sensitive data private: TensorFlow Federated and TensorFlow Privacy are now available on GitHub.

The Federated framework (TFF) includes an implementation of federated training, which lets users train shared models across a number of clients that keep their training data locally and aggregate the results for an improved model afterwards. This means that for example private data can be used for training purposes, without the need of uploading it to a central store. It could also be useful in cases where data is hard to transport for resource reasons, take edge devices for example.

Since TensorFlow is, like it or not, advanced by Google, TFF was designed with the company’s federated learning experiences in mind. It uses the approach for things like predicting the next word in a smartphone’s virtual keyboard in case you were wondering.

TFF is organised in layers: one for people to try the included federated training and evaluation, and a core component made up of lower-level interfaces to experiment with new federated algorithms. Implementations of the latter are something the project’s instigators would like to see contributed, but extensions to the old ones or just new federated datasets are appreciated as well.

In the future Google’s Alex Ingerman and Krzys Ostrowski would like to see runtimes for TFF on major device platforms and, aligned with the company’s Responsible AI Practices, “to integrate other technologies that help protect sensitive user data”.

One of those is the Python library TensorFlow Privacy (TFP), which also debuted in time for TensorFlow Dev Summit 2019 in early March 2019. If used, it should offer “strong mathematical guarantees” that a model doesn’t remember any details of sensitive training data like your Gmail emails, but only encode general patterns.

The approach behind that is called differential privacy and it, for example, helps to sort out out-of-the-ordinary details of data sets during model generation, which might either identify or reveal information about the person the data belongs to.

To make it easy to use, model architectures or training procedures don’t have to be changed when standard TensorFlow mechanisms are in play, though new hyperparameters have to be set and slight code alterations might be necessary when using TFP. Tutorials to guide you through the process are available via the project’s GitHub repository.