How Python 3.11 is gaining performance at the cost of ‘a bit more memory’

One of the most keenly anticipated new features in the forthcoming Python 3.11 (scheduled for October) is better performance. “Python 3.11 is up to 10-60 percent faster than Python 3.10,” state the release notes.

How is this being done? Python 3.11 is the first release to benefit from a project called Faster CPython, where CPython is the standard version of the interpreter.

Faster CPython is a project funded by Microsoft, whose members include Python inventor Guido van Rossum, Microsoft senior software engineer Eric Snow, and Mark Shannon – who is under contract to Microsoft as tech lead for the project.

A session scheduled for the EuroPython event to be held in Dublin in July centers on some of the changes that enable the speed-up. Shannon will describe the “adaptive specializing interpreter” in Python 3.11, which is PEP (Python Enhancement Proposal) 659. This describes a technique called specialization which, Shannon explains, “is typically done in the context of a JIT [just in time] compiler, but research shows specialization in an interpreter can boost performance significantly.”

The interpreter identifies code that can benefit from specialisation and “once an instruction in a code object has executed enough times, that instruction will be “specialized” by replacing it with a new instruction that is expected to execute faster for that operation,” states the PEP. The speed up can be “up to 50 percent.”

Shannon also identifies consecutively allocated execution frames, zero cost try-except, more regular object layout, and lazily created object dictionaries in his preview of the talk.

When Devclass spoke to Python Steering Council member and core developer Pablo Galindo about the new Memray memory profiler, he described how the Python team is using Microsoft’s work in 3.11.

“One of the things we are doing is that we’re making the interpreter faster,” he said, “but also it’s going to use a bit more memory, just a bit, because most of these optimizations have some kind of cost in memory, because we need to store some stuff to use it later, or because we have an optimized version but sometimes someone needs to request a non-optimized version for debugging, so we need to store both.”

Galindo explained how memory management is critical to performance. Python “has its own memory allocator that is not the system allocator,” he said. It is not because “we know better how to allocate the memory,” he said. Rather, it is because the system allocator has to be generic whereas the Python interpreter knows it will use the memory.

One of the tricks is to reduce the number of calls to the system allocator in favour of allocating a bigger chunk. “Let me have a big chunk of memory, I will use the different parts of it and release it in one go when I finish, because otherwise it’s going to be very slow,” he said.

Responding to recent Python 3.11 speed tests, one developer said that “this may be the first Python 3 that will actually be faster (about 5 percent) than Python 2.7. We’ve waited 12 years for this.”