Vladimir Makarov, a software developer at Red Hat, has pulled the covers off his lightweight JIT compiler project MIR, which could pose an alternative to GCC and LLVM based implementations.
Makarov is quite experienced in building just in time compilers, having been part of the team bringing JIT compilation to CRuby, a reference C-implementation of the Ruby programming language. Due to time restrictions, they defaulted to LLVM/GCC interfaces for that.
This comes at a price however, since CRuby makes use of more than one language, which makes it tricky to optimise with the setup chosen. Function inlining to reduce the number of method calls for example isn’t as good anymore, if it bloats one’s headers and consequently slows things down. Together with the big size and sometimes slow compilation speed of LLVM/GCC-based JITs, this all prompted Makarov to try another approach in his spare time.
MIR or Medium Internal Representation, as Makarov’s project is dubbed, is meant to solve these issues by providing “a basis to implement fast and lightweight interpreters and JITs”. And he has high hopes for the project, believing that using it for the JIT of the lightweight MRuby project for example “would help to expand Ruby usage from a mostly server market to the mobile and IoT markets”. However, MIR is designed to be universal and could therefore also spark interest outside the Ruby world.
MIR is strongly typed, “flexible enough” and aims at compiling 100 times faster with a 100 times faster start-up than GCC with -O2 activated. This is meant to be achieved by sticking to valuable optimisations like for register allocation or instruction selection. Makarov also wants to optimise frequently occurring cases only, using algorithms that combine simplicity and performance. If all goes according to plan, the implementation should stay below 10K lines of C code and without external dependencies.
In its current state, infantile as it might be, the projects succeeds in sticking to those last conditions, after Makarov got his MIR interpreter running without 3rd party libraries. According to his introductory blog post, he is now at a stage where he’s able to create MIR through an API or from MIR textual or binary representations. There are also ways to “interpret MIR code and generate AMD64 machine code in memory from MIR” as well as to “generate C code from MIR”.
To use the project in CRuby, however, a C-to-MIR compiler is needed. Makarov claims it to be “about 90 per cent done”. The compiler is quite standard, meaning it works in four passes. It doesn’t modify the ANSI standard grammar, and uses a parsing expression grammar manual parser, which is a bit slower than deterministic ones but simple and small which fits the project’s brief.
A LLVM IR-to-MIR compiler is also in the works, which is supposed to produce more optimised code when time isn’t an issue. Examples for using MIR as well as benchmarks can be found in the now open project repository. Makarov however warns about using it in a serious manner, since it’s still in early stages and will probably still change a lot.