The Azul-led CRaC (Coordinated Restore at Checkpoint) project aims to greatly improve the startup time of Java applications by using a cached image, and last week hit an important milestone: the general availability of OpenJDK 17 with CRaC support.
There are snags though. One is that CRaC requirements mean that many Java applications and frameworks will not work with it unless modified. Another is that it requires a feature of Linux called Checkpoint/Restore In Userspace (CRIU), undermining a core feature of Java, namely that it runs cross-platform.
Azul deputy CTO Simon Ritter told DevClass: “We submitted CRaC to OpenJDK as an idea for a project … and they’ve included it as one of the many project they have. The reason it isn’t in the mainstream yet, and I’m not even sure it will make it into the mainstream, is that we rely on the CRIU technology underneath on Linux, and there isn’t an equivalent that we’re aware of on Windows or on Mac. So you lose that cross-platform functionality … I don’t think it’s going to make it into the mainstream any time soon.”
The problem addressed by CRaC is an important one. Java applications require a runtime, the JVM (Java Virtual Machine), which means that application startup is relatively slow, as the JVM has to load and initialize before the application code runs. This might not matter with long-running applications, but in a microservices world where containers running parts of an application, or serverless functions, are constantly starting and stopping, it makes a big difference. It is the same issue AWS addressed with its Snapstart feature introduced at the re:Invent conference late last year.
“CRaC allows a running application to pause, snapshot its state, and store it for later use – even on a different machine. It saves the full context of the application process as an image, including its state and memory,” the Azul docs explain.
Can you not achieve fast start-up a different way by using GraalVM to compile to native code? The problem, Ritter told us, is that the JIT (just-in-time) compiler in the OpenJDK performs a lot better than GraalVM. He noted two reasons for this: “With GraalVM, parts of what the JVM does has to be compiled into the code, so they use the substrate VM, a project that was started back in the Sun days, and that provides the functionality that you would get normally in the JVM for things like garbage collection. The garbage collection that the substrate VM uses is not as sophisticated as you can get with the OpenJDK and that will lead to better performance with your application running on the JVM versus a native image.
“The other big thing with JIT compilation is speculative optimization where you look at how the code has run up until now and then optimize based on the assumption that it’s going to continue in the same way. We see that literally 50 percent of the performance improvements we get from JIT compilation are down to speculative optimizations, and you just can’t do that in a native compiled image.”
CRaC is a smart solution, but as the docs note, it cannot handle cases where files are open or network sockets active. “CRaC implementation checks for open files and sockets at the checkpoint. The checkpoint is aborted if one is found, also, an exception is thrown with a description of the file name or socket address.” Applications frequently do have open files and sockets, hence one cannot simply expect a Java application or library to work.
“You don’t have to do anything particularly clever,” said Ritter. “There are two methods, beforeCheckpoint and afterRestore … the idea is that you’ll be able to close file descriptors, close network connections … and then start up and get afterRestore called, and that’s when you open files again, create network connections and so on.”
Java applications are often developed on Windows or Mac, but most often deployed to Linux, suggesting that the lack of cross-platform support is not a deal-breaker for everyone.