Big Blue opens up hub for machine learning datasets

AI/ML

By Team Devclass

July 17, 2019

Big Blue opens up hub for machine learning datasets

IBM has launched a repository of datasets for training which data scientists can pick and mix to train their deep learning and machine learning models.

The IBM Data Asset eXchange (DAX) is designed to complement the Model Asset eXchange it launched earlier this year, which offers researchers and developers models to deploy or train with their own data.

In a blog announcing the data exchange, a quartet of IBM luminaries, wrote “Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses.”

The data sets in question will be covered by the Linux Foundation’s Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration – “where possible”.

DAX will also provide “unique access to various IBM and IBM Research datasets.” Big Blue has pledged to publish further datasets, and said “The datasets on DAX will integrate with IBM Cloud and AI services as appropriate.”

There are other ways to source data and models, with IBM’s announcement referencing GitHub and Kaggle, while the PyTorch hub launched a model repository earlier this year.

IBM claimed DAX would be “unique in its high level of quality and curation”, as it would help developers build “end-to-end” deep learning workflows, and allow “developers to consume open data with confidence under clearly defined open data licenses.”

That might sound rather dull to developers used to skunkworks-like conditions, but as machine learning creeps across the enterprise, compliance and ethical practices become a bigger concern.

“The CODAIT team’s goal is to make it straightforward to use DAX and MAX assets in conjunction with IBM AI products as well as other hybrid, multicloud AI tooling,” the team said, which will presumably be a relief for those developers who don’t want to actually lock themselves into IBM’s way of machine learning.

As of today, there are eight datasets on the exchange, including IBM’s Contracts Proposition Bank, which features text from IBM’s contracts, the NOAA Weather Data set for JFK Airport, and a set containing 100 randomly sampled discussion threads from Ubuntu Forums.

Big Blue opens up hub for machine learning datasets

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US