GraphPipe, released just hours ago by Oracle engineers was borne from the need for a consistent, efficient way to serve machine learning models.
The project is composed of a network protocol and a collection of tools that are meant to help in the deployment of machine learning models and to help decouple them from framework-specific implementations without hurting performance.
People tend to forget that building and training a model is only the first step of machine learning. The result still has to be deployed before it can be of any use. Depending on where the thing should run, GPU or CPU for example, this can become quite tricky. Also clients are needed to talk to the deployed models.
The project came about after the GraphPipe team took a look at the current machine learning landscape and saw that network protocols for serving machine learning models were tied to underlying model implementations most of the time. Not every framework offers a model server, and the external projects some rely on might not be suitable for large amounts of data because of their architecture.
Even though the Open Neural Network Exchange (ONNX) already proposes a standard format for models to solve the coupling issue, Oracle prefers the route of standardizing the protocol format, since simple conversion might not work for every existing model. The ability to leave model implementation out of the equation when setting up communication infrastructure between clients and machine-learning models in the back end could also lead to more maintainable results.
Going by the introductory blog post by Oracle’s Vish Abrams, the company hopes to establish its project as a standard for model serving. Nowadays model data is often tensor-like, and at the same time stored in JSON format, which is text based and might not necessarily be the most efficient option for data commonly noted in floating point numbers. Another option is taking the data from so-called protocol buffers, a serialization mechanism by Google, which leaves it up to the user to decide how to structure data as well as other implementation details.
Flatbuffer – the buffer GraphPipe uses as a message format and also a Google project – is similar, but avoids a memory copy during deserialization. Their definitions provide a request message that includes input tensors and names as well as the output names. A tensor is a mathematical construct that describes linear relations between geometric vectors, scalars, and other tensors. The GraphPipe remote model returns one tensor per requested output name, and provides metadata about types and shapes of the formats it supports.
Tools included in the GraphPipe project are client libraries for querying models, examples for serving models from different machine learning frameworks and a set of flatbuffer definitions. The latter serve as a foundation for the protocol already mentioned.