Blog

Home / Blog

Cython 3.0: The next generation of Python at the speed of C

Jason Li
Sr. Software Development Engineer
Skilled Angular and .NET developer, team leader for a healthcare insurance company.
August 05, 2023


Python is known for being among the most practical, well-equipped, and genuinely helpful programming languages. A Python superset that can be translated into C is called Cython. Depending on the work at hand, this results in performance gains that can range from a few percent to several orders of magnitude. The speedups won't be significant for tasks connected to native Python object types. However, the gains can be substantial for operations using numbers or any actions not involving Python's own internals. Without giving up Python's simplicity and convenience, you can get around or even get beyond all of its native limitations by using Cython.

Python to C compilation

Direct calls to C modules are possible in Python programmes. These C modules may be general-purpose C libraries or libraries created especially for use with Python. The second type of module is produced by Python and consists of C libraries that communicate with Python's internals and can be combined with existing Python code.

By design, Python and Cython code are very similar. Python 2.x and Python 3.x programmes can be fed into the Cython compiler, and Cython will accept them both as-is without using any of its inherent accelerations. However, Cython may replace slow Python objects with quick C equivalents if you add type annotations to the Python code using its unique syntax. The strategy used by Cython is gradual. That implies that a developer can start with a Python application that already exists and speed it up by making small modifications to the code rather than rebuilding the entire application.

This strategy fits with the nature of software performance problems in general. The great majority of CPU-intensive code is typically concentrated in a small number of hot spots in programs—a variation of the Pareto principle, commonly known as the “80/20” rule. As a result, just a small portion of the code in a Python programme needs to be performance-optimized. To acquire the performance boosts you require where it matters most, you can incrementally convert those hot regions into Cython. There is no more work necessary to keep the remainder of the programme in Python.

Cython’s primary objective

Cython's primary objective is to make it simpler to construct Python C extensions, whether for speed or to create handy interfaces to C libraries. Cython 3 thoroughly updates and modernises Cython in many ways. It discontinues support for the now-outdated Python 2, adds support for more recent Python features, and increases the utility of the "pure Python mode."

The use of Python's built-in code analysis and linting tools on Cython is made possible by the pure Python mode. It used to be challenging to troubleshoot Cython using Python tooling since it had its own odd syntax that was a cross between Python and the C type declaration syntax. With time, Cython started to provide a different syntax called pure Python mode that was fully consistent with standard Python syntax. All Cython functions, including those for calling external C libraries, are now available in pure Python mode.

One of the primary uses of Python's Cython language is calling external C libraries from Python code, in addition to writing fast code. Calling C functions directly in the code is actually very simple because Cython code compiles to C code.

NumPy support

NumPy support is yet another important area for development. Since NumPy and Cython have long been compatible, you can create Cython functions that hook directly and natively into NumPy functions and data structures. The addition of writing NumPy ufuncs directly in Cython in Cython 3 allows for, among other things, the quick and simple application of a straightforward numerical function written in Cython to the complete contents of a NumPy data structure.

While maintaining Python’s flexibility and user-friendliness, the NumPy module speeds up its capacity for number manipulation. Python users have a lightning-fast toolkit for working with data in matrices thanks to NumPy.

Even NumPy by itself is occasionally not quick enough. A usual solution is to just iterate through the matrix in Python, which negates all of the efficiency advantages of using NumPy in the first place, if you wish to do changes on NumPy matrices that aren't included in NumPy's API. Fortunately, Cython offers a superior method for working directly with NumPy data.

Limited API

The internals of Cython have also been updated to better match the continuing modifications to Python's internals. For example, Python's new "limited API" provides a guaranteed stable subset of its APIs, expressly for the kind of work Cython frequently performs to integrate with the Python interpreter. The constrained API has some initial, but growing, support in Cython 3.

Three years ago, Cython 3’s initial releases appeared alongside Python 3.8. The completion of Cython 3 was not given a specific completion date or version objective. However, the Cython development team promoted widespread adoption of the alpha and beta Cython 3 versions, and the project has progressed along with new Python features and internal updates at every stage.

Cython's benefits

Cython offers a number of additional benefits in addition to the ability to accelerate previously written programmes.

Improved performance when using external C libraries

For ease of use, C libraries are wrapped in Python interfaces by Python packages like NumPy. But using those wrappers to switch between Python and C can make things take longer. With Cython, Python is not a barrier to communication with the underlying libraries.

Both C and Python support memory management.

Python objects are memory managed and garbage collected the same way in standard Python if you use them. You can also design, control, and interact with your own C-level structures using malloc/free. Keep in mind to tidy up after yourself.

If necessary, you can choose between safety and speed.

Through the use of decorators and compiler directives, Cython automatically conducts runtime checks for common issues that arise in C, such as out-of-bounds access on an array. As a result, C code generated by Cython is generally significantly safer than C code written by hand, possibly at the expense of raw performance.

You can disable those tests for extra speed gains if you are certain you won't use them at runtime, either throughout the entire module or just on specific functions.

Additionally, Python structures that employ the buffer protocol to directly access data held in memory (without requiring intermediate copying) are accessible natively through Cython. The memoryviews in Cython enable you to work with those structures quickly and safely depending on the situation. A Python string, for example, can be read directly from the raw data underneath (quickly) rather than going through the Python runtime (slowly).

The GIL can be released for use with Cython C programmes.

The Global Interpreter Lock (GIL) feature of Python synchronises threads within the interpreter, preventing unauthorised access to Python objects and controlling resource contention. But the GIL has received a lot of flak for being a barrier to improving Python performance, particularly on multicore computers.

You can mark a portion of code with the with nogil: directive to allow it to operate without the GIL if it does a lengthy operation without making any references to Python objects. As a result, Cython code can utilise several cores (with increased work) while the Python interpreter is freed up to perform other tasks in the meantime.

Sensitive Python code can be hidden using Cython.

While built binaries are difficult to decompile and analyse, Python modules are trivially simple. You can use Cython compilation to shield some of a Python application's modules from naive snooping before releasing it to end users.

However, it should be noted that such obfuscation is not one of Cython's intended uses; rather, it is a byproduct of its capabilities. Decompiling or reverse-engineering a binary is also achievable if one is committed or persistent enough. Additionally, it's never a good idea to hide critical information or tokens in binaries because a simple hex dump can frequently easily reveal them.

Cython-compiled modules can be shared.

Cython-compiled components may be included to a Python package that is being created for internal or external distribution via PyPI. The Python wheels must be built separately for each computer architecture, however those components can be pre-compiled for particular machine architectures. If it doesn't work, the user can compile the Cython code as part of setup if the target computer has a C compiler.

Conclusion

The Python code that Cython can’t fully translate into C is converted into a series of C calls to Python’s internals. This corresponds to removing the Python interpreter from the execution loop, which by default speeds up code by a modest 15 to 20 percent. This is a best-case scenario, therefore there is a chance that performance may not even increase in some circumstances. Performance should be compared before and after to identify any changes.