Microsoft's Bing team reshape Google's BERT in their own Azure-powered image • DEVCLASS

AI/ML

Microsoft’s Bing team reshape Google’s BERT in their own Azure-powered image

By Team Devclass

July 18, 2019

Microsoft’s Bing team reshape Google’s BERT in their own Azure-powered image

Researchers’ at Microsoft’s Bing organisation have open sourced a brace of recipes for pre-training and fine-tuning BERT, the NLP model which Google itself open sourced just last November.

Google describes BERT as “the first deeply bidirectional, unsupervised language representation, pre-trained only using a plain text corpus” – the corpus in question being Wikipedia.

Wikipedia’s collective knowledge may be vast, researchers at Microsoft said, and “The broad applicability of BERT means that most developers and data scientists are able to use a pre-trained variant of BERT rather than building a new version from the ground up with new data.”

However, they continued, “it will not deliver best-in-class accuracy when crossing over to a new problem space.” For example, they suggest, a model for analysing medical notes needs a deep understanding of the medical domain, while processing legal documents needs training on, yes, legal documents.

Fine-tuning the model is not enough, they reason, and pre-training is in order. In addition, “users will need to change the model architecture, training data, cost function, tasks, and optimization routines. All these changes need to be explored at large parameter and training data sizes.”

The changes are “quite substantial”, with BERT-large having 340 million parameters, and has been trained over 2.5 billion Wikipedia and 800 million BookCorpus words. Microsoft unsurprisingly chose to do this using its own Azure machine learning service.

“To get the training to converge to the same quality as the original BERT release on GPUs was non-trivial,” wrote Saurabh Tiwary, Applied Science Manager at Bing. “To pre-train BERT we needed massive computation and memory, which means we had to distribute the computation across multiple GPUs. However, doing that in a cost effective and efficient way with predictable behaviors in terms of convergence and quality of the final resulting model was quite challenging.”

The result is two recipes for pre-training and fine-tuning BERT using Azure’s Machine Learning service. The GitHub repo for the work includes a PyTorch Pretrained BERT package from Hugging Face, and also includes data preprocessing code which can be used on “Wikipedia corpus or other datasets for pretraining.” Raw and preprocessed English Wikipedia datasets, and pre-trained models are provided.

It also hosts an Azure Machine Learning service Jupyter notebook to launch pre-training, though the code, data, scripts and tooling can run in “any other training environment.”

Microsoft’s Bing team reshape Google’s BERT in their own Azure-powered image

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US