DeepSpeed and T-NGL - Microsoft’s way of giving computational complexity ZeRO thought • DEVCLASS

AI/ML

DeepSpeed and T-NGL – Microsoft’s way of giving computational complexity ZeRO thought

By Julia Schmidt

February 11, 2020

DeepSpeed and T-NGL – Microsoft’s way of giving computational complexity ZeRO thought

In a rush to blast through the restrictions set by current-day hardware, Microsoft has introduced a new generative language model to a selected audience and shared a bit of the tech behind it with the open source community.

The new apple to the company’s AI eye is called T-NLG, which is short for Turing Natural Language Generation and part of Microsoft’s Project Turing. Why the project was named after the British luminary can be guessed, it’s official aim however is to “scale deep learning efforts at Microsoft to solve customer and business problems across various products, starting with Search”.

The base for improvement isn’t any old search, though, but Google’s web search, since the use case, getting a direct answer to a question posed, is something neither Ecosia nor DuckDuckGo tackle in the way presented in the Turing NGL blog post.

If the answering questions use case isn’t your cuppa, there’s plenty more to do with T-NLG. “T-NLG is a Transformer-based generative language model, which means it can generate words to complete open-ended textual tasks. In addition to completing an unfinished sentence, it can generate direct answers to questions and summaries of input documents.”

To get there, Microsoft’s approach includes training 17 billion parameters – just for comparison, Facebook’s natural language processing project RoBERTa used “only” around 355 million params. This comes at a price though, as Corby Rosset, applied scientist at Microsoft, points out in the project’s introduction. “Large models offer significant accuracy gains, but training billions to trillions of parameters frequently runs up against fundamental hardware limitations.”

A workaround to this unsurprising fact was implemented in the form of DeepSpeed, which is now openly available under a MIT license. The project’s repository touts it as a way to “train DL models with over a hundred billion parameters on current generation of GPU clusters, while achieving over 5x improvement in system performance compared to the state-of-art”.

DeepSpeed is meant to be used in concert with PyTorch, which might make it appealing to those working with the deep learning library. It also comes with the much highlighted ZeRO optimiser as one of its core features. ZeRO promises to “greatly reduce the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained”.

This is realised by a partitioning of model states, which supposedly saves a lot of memory when compared to data parallelism approaches that replicate memory states across processes. For better scalability it can also be combined with model parallelism approaches, while DeepSpeed’s support for advanced hyperparameter tuning and large batch size optimisers helps with effectiveness.

Unlike DeepSpeed, however, T-NLG isn’t quite ready for public consumption, since Microsoft has only released a private demo “to a small set of users within the academic community for initial testing and feedback”.

Maybe some of this feedback could address the explainability of the system’s output. Or even go deeper into the energy needed for computing models as complex as this, which has become quite a discussion point with some devs getting more aware of tech’s contributions to climate change.

But it probably won’t and Bing will just start answering your question about the Oscar 2020 winners with people’s names instead of presenting you with some Mirror article.

Sourcegraph coding assistant now supports Anthropic Claude 3 – though limited to 7K token input

Supabase moves out of beta, adds supports for Swift, plugs in Oriole storage engine

Go dev survey shows frustration with Python’s dominance of AI

AI coding: Hugging Face engineer extols benefits of open source models, but hard questions remain

.NET Smart Components experiment the "Visual Basic" of AI programming?

GitHub autofix progresses to public beta: insecure code corrected with AI, but only for enterprise

JetBrains bows to user pressure and unbundles AI Assistant in new IntelliJ IDEA beta

Hands On: Netlify AI-assisted deployment aims to reduce log-diving

Stack Overflow turns to Google for hosting and AI features, trusts in Gemini for tech answers

Employing your cloud data warehouse to scale up AI/ML

Rust-based Zed editor now open source – with built-in support for OpenAI and GitHub Copilot

AI assistance is leading to lower code quality, claim researchers

DeepSpeed and T-NGL – Microsoft’s way of giving computational complexity ZeRO thought

ABOUT US

FOLLOW US