GitLab has put data in its sights, launching a project to create a single platform to support the data science lifecycle just as its eponymous product supports the application lifecycle.
Dubbed Meltano – model, extract, load, transform, analyze, notebook, orchestrate – the project aims to do “data ops, data engineering, analytics, business intelligence, and data science [using] software development best practices including version control, CI, CD, and review apps.”
The promise is “business intelligence as code” which will no doubt appeal to those steeped in DevOps, automation et al who are thinking that data really needs to be brought under their purview. Whether data veterans and newly minted data scientists feel they need this helping hand might be another issue.
GitLab described its own challenges managing data and making predictions. ”As is the case with many data teams, we currently do this with a series of steps and separate tools, and we’re not yet at the level of process and stability that is commonplace in software development.
“The idea of bringing best practices from software development to data analytics is a huge draw for the Data team at GitLab,” it said. “Ideally, all of our work could be done in open source tools, and could be version controlled, and we’d be able to track the state of the analytics pipeline from raw data to visualization.”
Senior Product Manager Joshua Lambert added, “As an open source tool, we think Meltano will make a big difference for teams without much money to invest in data analytics. It’s a new field for many organizations, and we want to do everything we can to make it easier for teams and business to access their data and make better decisions.”
Some in the software development and deployment world have suggested that data scientists and their machine learning and AI focused brethren have yet to embrace the sort of disciplines and efficiencies practices like DevOps, CD/CI can bring.
For its part, GitLab has pitched Git-ification as the future for all sorts of collaborative fields. CEO Sid Sijbrandij has previously said Hollywood could learn a lot from GitLab and how its platform can be used for collaboration. He’s also said that it has customers in the legal and publishing sectors.
Prof Mark Whitehorn of Dundee University said, “Some data scientists will simply see it as a good idea, which I think it is. But some DSs pass on the job of productionising a DS solution to, say, data engineers. So those DSs may not even see it as a problem.”
He added, “In practice, if the ideas are poorly presented…the DSs may well not listen. Which is a shame. Even if you are not involved in deployment it makes a great deal of sense to understand the problems and the possible solutions.”
The formal announcement and request for contributions is here, but there’s much more info at the Meltano repository here.