Netflix’s Personalization Infrastructure team has open sourced its experimental Polynote notebook in a bid to bring reproducibility and polyglotism to machine learning researchers and data scientists.
It was developed to help the company’s machine learning researchers concentrate on their work by getting rid of frustrations they had with the lack of tooling provided for code editing in notebooks and offer them ways to use Scala (and Spark), SQL, and Python in the same environment.
In comparison to other notebook environments like Jupyter, Polynote wasn’t built on a read-eval-print loop (REPL). According to the project’s creators, this should help in reducing surprises with notebook results. When using the REPL approach, evaluated expressions and the evaluation results are immutable. They are added to a global state the next expression can use. Users, however, are able to execute cells in any order, which in turn affects other cells and, with the state not really being visible, can make it difficult to reproduce results.
This is why the team decided to let Polynote construct the input state for a given cell based on the cells that have run above, making the position of a cell important and helping to make the result more predictable.
Another thing that is meant to help with reproducibility is the level of insight Polynote offers. The UI includes a number of helpers that let users know which cells currently run which statements and which other jobs are active at any given time. It also publicises the status of a kernel (idle and connected, busy, disconnected, or not started) and the values resulting from a cell’s execution.
Dependencies for each notebook can be set in a configuration section and are stored directly within the notebook. Exploring and visualising data is facilitated by the inclusion of open source libraries Vega and matplotlib, and features like a plot constructor, a data schema view, or a table inspector.
To make working with the new environment easier, Polynote provides interactive code completion, error highlighting, and a rich text editor for text cells and inserting LaTeX equations. It also allows users to write each cell in a different language and share variables between them, which helps if a data set is for example generated in a language different from the one used to do computations.
Currently the project apparently is able to work with Scala, Python, and SQL, since those are the languages Netflix’s researchers make most use of. More might follow if the open source community sees the need to add any.