Skint but looking to get complex machine learning models into production? Serverless might be the answer

Webcast – Combining Serverless and BERT for accuracy and cost-effectiveness with the MCubed web lecture series

An old truism of Machine Learning assumes that the more complex (and therefore the larger) a model is, the more accurate the outcome of its predictions. And indeed, if you’re looking into machine learning disciplines like natural language processing (NLP), it’s the massive models generated using BERT or GPT that currently get practitioners swooning when it comes to precision.

Enthusiasm fades when it comes to productionising models, however, as their sheer size turns deployments into quite a struggle. Not to mention the cost of setting up and maintaining the infrastructure needed to make the step from research to production happen. 

Reading this, avid followers of IT trends might now remember the emergence of Serverless Computing a couple of years ago. The approach pretty much promised large computing capabilities that could automatically scale up and down to satisfy changing demands and keep costs low. It also brought about an option to free teams from the burden of looking after their infrastructure, as it mostly came in the form of managed offerings.

Well, serverless hasn’t gone anywhere since then, and seems like an almost ideal solution on first looks. Digging deeper however, limitations on things like memory occupation and deployment package size stand in the way of making it a straightforward option. Interest in combining serverless and machine learning is growing, though. And with it the number of people working on ways to make BERT models and Co fit provider specifications to facilitate serverless deployments.

To learn more about these developments, we’ll welcome Marek Šuppa to episode 4 of our MCubed web lecture series for machine learning practitioners on December 2. Šuppa is head of data at Q&A and polling app Slido, where he and some colleagues used the last year to investigate ways to modify models for sentiment analysis and classification so that they can be used in serverless environments — without dreaded performance degradations.

In his talk, Šuppa will speak a bit about his team’s use case, the things that made them consider serverless, troubles they encountered during their studies, and the approaches they found to be the most promising to reach latency levels appropriate for production environments for their deployments.

As usual, the webcast on December 2 will start at 11:00 UTC with a roundup of software development-related machine learning news, which will give you a couple of minutes to settle in before we dive into the topic of model deployment in serverless environments. We’d love to see you there — we’ll even send you a quick reminder on the day, just register here.

And if machine learning at large still seems exciting but a bit out of reach for you, we’re sure our introductory online workshop with Prof Mark Whitehorn on December 9 can help you get started. Head over to the MCubed website for more information and tickets.