Google’s cloud warehouse teaches data scientists new tricks

Google’s cloud warehouse teaches data scientists new tricks

Data scientists who use Google’s cloud data warehouse BigQuery for analysis now have the chance to add Machine Learning to the mix.

BigQuery ML is a new, though still in beta, set of SQL extensions, which should give users proficient in the query language ways to build and deploy Machine Learning models. Those can then be used to perform predictive analytics on BigQuery data, without having to shift them from the warehouse, which can get tedious with large, distributed data sets. To help along the way, BQ ML sets smart default values and handles data transformations.

Since it’s still in development, pricing on BigQuery ML isn’t final yet, so at the moment charges are based on the data processed by each query. Customers with a flat-rate should be able to use their existing BQ ML slots until 31 July, 2019.

To get results quicker when analysing large data sets, BigQuery now also includes a way to cluster tables by defining corresponding keys. During queries, rows with similar keys are bundled so that not the whole partition or table has to be scanned. This is supposed to make the process faster and cheaper in the end, since the amount of data processed factors in here as well. The clustering feature is also in beta still, as is the new Sheets data connector, which gives data scientists direct access to data in Google Sheets, which is part of the companies G Suite offering.

Another addition is BigQuery GIS, GIS being short for geographic information system, which was developed together with the team behind the Google Earth Engine. It provides functions and data types following the SQL/MM Spatial standard which describes how to store, retrieve and process spatial data using SQL. PostGIS users should therefore have no problem getting familiar with it, although WKT (well known text) and GeoJSON are supposed to be supported as well.

For visualisation purposes, the collaboration also developed BigQuery Geo Viz, to plot and style geospatial query results, which given the popularity of location-based applications might turn out to be quite useful. Projects interested in giving those add-ons a go despite their alpha status, can contact Google to get whitelisted and receive the documentation necessary.