Google open sources method to join datasets without gatecrashing privacy

AI/ML

By Team Devclass

June 20, 2019

Google open sources method to join datasets without gatecrashing privacy

Google has open sourced a method for secure multi-party computation that it reckons will allow organisations to work on confidential data sets while keeping individuals’ details encrypted – potentially making machine learning less of a privacy nightmare.

Private Join and Compute builds on the principles behind the Password Checkup extension Google released earlier this year, which relies on the private set intersection cryptographic protocol.

It aims to solve the question of “How can one party gain aggregated insights about the other party’s data without either of them learning any information about individuals in the datasets?”

Two technologies are used. Private set intersection allows two parties to join their data sets and discover common identifiers by using an oblivious variant which only marks encrypted identifiers without learning any of the identifiers. This is combined with homomorphic encryption, which “allows certain types of computation to be performed directly on encrypted data without having to decrypt it first, which preserves the privacy of raw data”.

Google says “This combination of techniques ensures that nothing but the size of the joined set and the statistics (e.g. sum) of its associated values is revealed. Individual items are strongly encrypted with random keys throughout and are not available in raw form to the other party or anyone else.”

Google claims that “two parties can encrypt their identifiers and associated data, and then join them. They can then do certain types of calculations on the overlapping set of data to draw useful information from both datasets in aggregate.” But throughout, identifiers and associated data remain “fully encrypted and unreadable.”

According to Google’s paper on the topic, the research was prompted by the question of how to compute the aggregate conversion rate (or effectiveness) of advertising campaigns – a subject close to Google’s heart.

Google said it is exploring use cases including user security, obviously, aggregated ads measure, also obviously, and collaborative machine learning. It postulated applications in areas where there are particular sensitivities about revealing details on individuals represented in data, such as public police, or tracking diversity and inclusion. And of course, healthcare.

You can access the protocol, and the heavyweight paper detailing the work via Google’s security blog here.

Google open sources method to join datasets without gatecrashing privacy

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US