Amazon Redshift ML available: train and operate machine learning models on Redshift data

machine learning

Amazon has made available Redshift ML, an extension of its data warehouse service that makes it easier for data analysts and developers to create and train a machine learning model using their data stored in Redshift.

Redshift ML was announced as a preview at Amazon’s re:Invent conference last year and  takes advantage of Amazon SageMaker, a fully managed machine learning service, to build a machine learning model based on user data, without the user necessarily having to learn new tools or languages.

Amazon SageMaker already makes it possible to use SQL statements to create and train machine learning models from data, then use these models to make predictions. Redshift ML takes this further by automating the legwork of exporting the training data from Redshift to a bucket in the Amazon S3 storage service, then starting the machine learning training process.

To create a machine learning model, developers can thus use a simple SQL query to specify the data required to train the model and the output value they want to predict, according to Danilo Poccia, chief evangelist (EMEA) at Amazon Web Services.

“For example, to create a model that predicts the success rate for your marketing activities, you define your inputs by selecting the columns that include customer profiles and results from previous marketing campaigns, and the output column you want to predict. In this example, the output column could be one that shows whether a customer has shown interest in a campaign,” he explained in a posting on the here.

After the SQL command is run, Redshift ML securely exports the specified data from Redshift to the user’s S3 bucket and calls SageMaker’s Autopilot tool to prepare the data, select an appropriate pre-built algorithm, and apply the algorithm for model training. Users can also specify the algorithm to use.

Redshift ML handles all of the interactions between Redshift, S3, and SageMaker, including all the steps involved in training and compilation. When the model has been trained, Redshift ML uses SageMaker Neo to optimise the model for deployment and makes it available as a SQL function. The user can then simply use that SQL function to apply the machine learning model in queries, reports, and dashboards.

Redshift ML is available in AWS regions in the US, Europe and Asia. When training a new model, users will be charged for the SageMaker Autopilot and S3 resources used by Redshift ML.