Do you want rules with that? Data sharing is caring à la Microsoft AI

Do you want rules with that? Data sharing is caring à la Microsoft AI

Microsoft has drafted three agreements regarding the sharing and using of data for the AI community to give feedback on.

Most devs who ever trained a model will know about the huge role high-quality data plays in getting a good outcome. But having access to the amount necessary isn’t always a piece of cake, especially if you’re more on the research side of the spectrum and have to rely on partners delivering to make any kind of progress.

And even then it isn’t always easy, because no matter where you get your data from, privacy and security must always be taken into account. To help with that and “remove barriers to data innovation” as Microsoft puts it, the company has shared drafts for three data use agreements. They cover the open (O-UDA) and the computational (C-UDA) use of data respectively, while a third one (DUA-OAI) is specifically set up to arrange data use for Open AI Model Development.

It is designed to “share data for the limited purpose of training an AI model”, leaving O-UDA to guard the more general use of data under minimal obligations and C-UDA to help with making “data available to anyone for computational use purposes, such as artificial intelligence, machine learning, and text and data mining”. All documents come with an overview of the obligations data users and redistributors have to comply to, as well as a contemplated use case for clarification.

Unlike the agreements available, as Microsoft’s Corporate Vice President and Chief IP Counsel Erich Andersen likes to point out in a blog post, the company’s proposals are meant to be less complex, and better explained. The drafts are based on some precursors as well as the experience Microsoft gained in projects that made data sharing necessary.

 “Sharing data between organizations can help address some of society’s biggest challenges” Andersen states (as do all of the proposals). Good tools to realise that, however, “are often immature or non-existent”, leaving Microsoft itching to put a stake into that ground. 

Nevertheless cooperation is vital to end up with something more organisations are willing to consider (parallels to open source licenses come to mind in this context). Which might be why Microsoft is looking for feedback to help “improve these agreements and to offer additional ones that cover a wide range of data sharing scenarios”. 

To get the ball rolling, the first deadline for change proposals is October 1, 2019. A first final version of all data use agreements is planned to land some time in autumn.

The announcement comes only a couple of days after Microsoft informed the public about a partnership with and an investment in artificial intelligence research company OpenAI, who could very well become one of the first contributors to the discussion.