Fermyon previews new spin on Serverless AI via Wasm

By Tim Anderson

September 6, 2023

Fermyon previews new spin on Serverless AI via Wasm

Fermyon, specialists in WebAssembly (Wasm) microservices, has introduced a new serverless AI platform, in association with Kubernetes hosting service Civo.

The service uses Large Language Models (LLMs) from Meta, Llama 2 and Code Llama (AI for coding), which are open source and free to use. “Using WebAssembly to run workloads, we can assign a fraction of a GPU to a user application just in time to execute an AI operation,” said CTO and co-founder Radu Matei in a post today.

The service was announced at the Civo Navigate event in London. Civo provides the GPU compute service which underlies Fermyon Serverless AI, when running on Fermyon Cloud. The company claims that Serverless AI, in private beta, makes AI apps affordable because it avoids the expense of “access to GPUs at $32/instance-hour and upwards.”

Other on-demand AI services exist but do not perform as well, the company said, because of slow start-up times, whereas “Fermyon Serverless AI has solved this problem by offering 50 millisecond cold start times.”

Fermyon’s approach rests on the efficiency of sandboxed Wasm code versus containers or VMs – a similar approach to that used by Cloudflare Workers, which use V8 Isolates, V8 being the JavaScript engine also used by Google Chrome and Node.js. The downside is that the sandboxing may be less secure than that offered by VMs.

Serverless AI will be a new component of the open source Spin project which is a platform for Wasm microservices. Spin can run locally on a developer machine and be deployed to Fermyon’s own cloud hosting platform or elsewhere. Supported languages include Rust (the primary language), TypeScript, Python, TinyGo or C#. TinyGo is a version of Go which includes Wasm support as well as WASI (WebAssembly System Interface), enabling running outside the browser, which is why it can be supported by Spin. Note also that Go itself now has an experimental port for WASI.

There are some limitations in the preview. Specifically, users can have up to 75 inferencing requests per hour, and 200 embedding requests, where embeddings are a way of persisting text data as a vector of numbers.

Matei said that developers using the Serverless AI preview will be able to execute inferencing on LLMs for Lllma2 and Code Llama, generate sentence embeddings and store, search and retrieve them, cache responses in a built-in key/value database, and run “entire full stack serverless applications” using the service alongside other existing features of the Fermyon platform.

Developers can also run Serverless AI locally but can expect slow performance. Matei quoted delays of 20-30 seconds on an Apple M1 laptop, compared to 750 millieseconds using the cloud service, including the cold start time for the serverless endpoint.

Possible uses include text processing and summarization, chatbots, and generating code from natural language input.

Fermyon previews new spin on Serverless AI via Wasm

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Apple's Containerization will matter to developers – but Podman devs complain of unfixed issues

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

Researchers warn of prompt injection vulnerability in GitHub MCP with no obvious fix

MCP will be built into Windows to make an 'agentic OS' but security will be a key concern

ABOUT US

FOLLOW US