Implementing Python for .Net

Author:	Duncan Booth
Contact:	duncan.booth@suttoncourtenay.org.uk

Abstract

Python is portable across many platforms. There are two implementations: the original C one, and Jython, so why is there still no implementation for .Net, and can we ever expect to see one?

Contents

1 History

1.1 Activestate study

Mark Hammond and Greg Stein created the Python for .NET implementation between early 1999 and July 2000. The work was performed under contract to Microsoft.

The aim of this study was to show that it was possible to have compiled Python programs fully supported within the .Net framework.

Conclusions of the study included this:

There is no support for some of the features of .NET that other frameworks will require, such as custom attributes, PInvoke or ASP.NET.

There are some interesting issues to be solved here, but the main problem is deciding a suitable mechanism for, say, custom attributes. The syntax (or syntaxes) proposed in PEP 318 might provide an useful pointer.

Related topics are the mismatch between the class/instance semantics, module/package semantics and the exception systems.

The study was largely based on modelling the semantics of the C-Python ¹ API in .NET, also it was based on Python 2.1. Modelling the behaviour on new-style classes, and making the runtime based on a class hierarchy rather than a set of functions seems to reduce the mismatch. Inevitably though there will always have to be some semantic differences.

The speed of the current system is so low as to render the current implementation useless for anything beyond demonstration purposes. This speed problem applies to both the compiler itself, and the code generated by the compiler. Given that part of the appeal of Python programming is a quick edit-compile-run cycle, the speed issues severely limit the utility of Python on this platform. Some of the blame for this slow performance lies in the domain of .NET internals and Reflection::Emit, but some of it is due to the simple implementation of the Python for .NET compiler.

In fact, although there are many inefficiencies in the simple implementation, the overwhelming bottleneck is a result of the Reflection::Emit libraries. Reference to the speed of the compiler is also a red herring: the compiler is working through a COM interface layer, and a full implementation of Python on .NET would, I hope, be capable of hosting the compiler itself. (Besides, the compiler isn't actually all that slow.)

There is no support for some Python features that some programs will require. Examples include: string formatting, long integers, complex numbers, standard library, etc.

The study didn't implement these features because they were not needed to complete the study. There is however no barrier to implementing them.

Overall, these conclusions are somewhat misleading. Later in the paper we read:

Only a small amount of effort has gone into analysing the performance of the runtime, mainly due to the lack of performance analysis tools available for .NET. Without such tools, making performance related changes is fruitless, as the effectiveness is difficult to measure.

Not withstanding the tuning of the runtime system, the simple existence of the runtime accounts for much of our performance problem. When simple arithmetic expressions take hundreds or thousands of Intermediate Language instructions (via the Python runtime) to complete, performance will always be a struggle.

Even with performance analysis tools it is hard to measure where the Python is spending its time. This is because the bottleneck is happening on every function call, so it gets spread evenly across the run. If you identify a hotspot the no amount of optimisation makes more than a small difference to the overall runtime.

The bit about arithmetic is another red herring. If we decide that integer arithmetic needs optimisation, the polymorphism can reduce integer adds down to two virtual method calls and an integer add. The implementation provided by the Activestate study performs a series of tests for different operand types, and while this is necessary at some level to support the full semantics required for arithmetic it is easy enough to short circuit the tests for a few common cases.

class PyObject {
    public virtual PyObject __add__(PyObject p) { ... }
    public virtual PyObject __radd__(PyObject p) { ... }

    public virtual PyObject __radd__(int v) {
        return this.__radd__((PyInt)v);
    }
};
class PyInt: PyObject {
    override PyObject __add__(PyObject p) {
        return p.__radd__(this.value);
    }
    override PyObject __radd__(int v) {
        try {
            return (PyInt) checked(this.value + v);
        } catch(OverflowException) {
            return (PyLong)((long)this.value + v);
        }
    }
}

Class and instance semantics

Python supports multiple inheritance, however, since the .NET study came out Python has implemented new style classes which, while they still support multiple inheritance have a restriction: they cannot inherit from more than one builtin type.

A similar restriction can be imposed for .NET classes. They could support multiple inheritance provided they only inherit from a single .NET base class.

Type declarations

Type declarations, or type inferencing would help for speed. I hope that sometime soon the PyPy project will deliver a wonderful new parsers and compiler for Python that performs all the necessary optimisations (and I believe in the tooth fairy!) I don't believe that this is needed to make a .NET Python competitive with C-Python. Of course, if our target is to beat C# we might need some help.

Another reason why some form of type declaration would be useful will be to interact fully with the other .NET languages. It is to be hoped that PEP318 will provide a basis for introducing statements that have a declarative effect.

1.2 Python for .Net

Python for .Net, written by Brian Lloyd, is a bridge which connects Python 2.3 to the CLR. This isn't attempting to make Python a true .Net language running under the CLR, but it does deliver much of the Python functionality for .Net today.

What you can do

You can run existing Python scripts, and you can use any existing Python libraries.

You can import CLR classes, access their methods and attributes. You can't subclass a CLR class (or at least, I couldn't get it to work). It partly runs under Mono, but Mono needs to improve significantly before it can fully support Python for .Net

What you cannot do

You cannot use a Python class from the CLR. You cannot script ASP.Net pages using Python.

There is an interesting issue concerning the difference between value and reference types. Specifically, if you have an array of value types, you cannot modify elements by subscripting into the array from Python:

items = CLR.System.Array.CreateInstance(Point, 3)
  for i in range(3):
    items[i] = Point(0, 0)

items[0].X = 1 # won't work!!

Instead you have to write something like this:

items = CLR.System.Array.CreateInstance(Point, 3)
  for i in range(3):
    items[i] = Point(0, 0)

# This _will_ work. We get 'item' as a boxed copy of the Point
# object actually stored in the array. After making our changes
# we re-set the array item to update the bits in the array.

item = items[0]
item.X = 1
items[0] = item

I expect that the same issue will arise with a direct implementation of Python under .Net as well. If Python evaluates the partial expression as to a reference, then it will implicitly box the value at that point and any changes to the boxed value will be lost.

How fast is it?

I haven't done extensive timing tests, but it isn't fast. A simple assignment into a CLR hashtable is about 100 times slower than the equivalent assignment into a Python dictionary. On this basis the original Activestate Python port really flies, but of course the comparison is unfair as any actual processing done in Python will run at normal C-Python speeds (which it wouldn't do on a natively ported Python).

Today, when you use Python to drive COM objects, you must try to avoid unnecessary calls through the COM layer. Exactly the same holds true for Python for .Net. You can run code on either side of the bridge at full speed, but try to avoid crossing that bridge more often than absolutely necessary.

1.3 My story

Around July last year I finally got around to re-reading the ActivateState paper. It struck me that I didn't agree with some of the conclusions (although I didn't realise at the time where the real issues lay), so I got hold of a copy of the code and started working on it.

It took quite a bit of work just to get it to compile properly on .Net 1.0 --- the study had been done on a Beta of .Net, and there had been significant changes in the final release. Once I got it running, I found that it ran the Pystone benchmark about 25 times slower than C-Python.

The original runtime was modelled on C-Python's runtime, and I started off by refactoring it into a more object-oriented style. I also made some of the obvious micro optimisations to try to speed things up. This immediately resulted in a surprise: nothing I tried had more than a few percentage points difference in the running speed, and profiling the code didn't indicate particular hotspots.

After some more experimenting I concluded that the big bottleneck was the use of the reflection classes to model Python function calls. I concluded that the only solution was to produce code at runtime to perform function calls instead of relying on reflection.

My plan:

Get the .Net compiler running at a reasonable speed

Add the most important of the missing libraries and runtime

Try to play catchup with C-Python's feature-creep

Unfortunately, events intervened to prevent me filling this plan.

1.4 IronPython

In December 2003, Jim Hugunin posted a message to a .Net mailing list claiming to have a version of Python running under .Net. His initial benchmarks showed that, for some operations at least, he was able to run faster than C-Python.

Unfortunately, at that time he was unwilling to share any of the code.

Since then he has given a talk on IronPython at PyCon (March this year), and has another one planned for July this year. Sadly though, he still hasn't got to the stage where he is happy to let other people in on the project.

Benchmarks should be taken with a pinch of salt, but the claim is that IronPython-0.2 is 1.4x faster than Python-2.3 on the standard pystone benchmark. Whether or not you find Pystone a useful predictor of final application speed, the fact remains that this makes IronPython a contender to be as fast or faster than C-Python.

The IronPython paper compares the speed over a wide range of operations showing that in some areas IronPython is much faster, and in other areas slower than C-Python. It may be that the areas where IronPython is faster indicate areas where C-Python could be optimised. For example, setting a global variable in C-Python involves a dictionary lookup, but there is really no reason why this should be the case. A fully compiled .Net Python is a better tool for trying out different code generation strategies here than C-Python where you are limited in the optimisations that can be made without breaking bytecode compatability.

The areas where IronPython is slower are certainly areas where it could be optimised, but that shouldn't be the immediate focus. The concern now should be to fill out the libraries and turn it from a research project into a viable product.

2 How to do the impossible

2.1 The speed issue

The original Activestate project blamed speed on:

.Net internals

Reflection.Emit

The simple nature of the implementation.

Some of this is true. By moving to an object-oriented model, and by modelling the runtime library on Python's semantics we can expect a reasonable improvement.

The problem with delegates

The real bottleneck though, is the use of the reflection library on every attribute lookup or method call. Using reflection to find a method is slow, but the real hit comes when you try to call a method using the object that was returned. A call through a made through the reflection classes is at least 100 times slower than a direct call or a call through a delegate. Creating a delegate from the method object is also possible, but is just as slow if you only use the delegate once. There might be a benefit if the delegate could be cached and reused, but there are other problems with delegates.

The most significant callable objects in Python are: functions, unbound methods, bound methods, and type objects.

.Net doesn't have pointers to functions ², instead it uses delegates. Delegates are cool: they can be combined to give multicast delegates, and any delegate may also be called asynchronously allowing for interesting threaded applications. However, delegates also have major limitations.

1. You can create a delegate analagous to a Python bound method. i.e. one that refers to a non static method in a specific instance of an object.

2. You can create a delegate analagous to a Python function pointer, i.e. one that points to a static method with no associated instance.

But: you cannot create a delegate analagous to an unbound method, i.e. one where we have a reference to a non static method, but no specific instance.

And: you cannot create a delegate analagous to a type object, i.e. one that creates an object. In .Net, the call to a constructor uses a different instruction sequence than a call to an ordinary method, and there is no way to wrap such a call in a delegate without providing a factory function.

N.B. There are no equivalents to native methods required in .Net Python, since all Python code is compiled to native functions.

Function objects

In the model I tried to use, a def statement creates a PyFunction object that will call the original code. (This is another reason why delegates on their own are not sufficient: Python functions have writeable attributes, so we need something more than a delegate.) The PyFunction object stores the default arguments, and handles all processing of default and keyword arguments.

It is easy to overlook the fact that although in Python it looks as though all calls are completely dynamic, there are some things fixed in a function call.

When calling a function, you cannot generally tell in advance how many arguments the function will accept, nor what their names are. However, at the point of call you can almost always tell how many arguments you are passing.

When a function is entered, it doesn't know how many arguments it was given when called, nor which were positional and which keyword. However, every function actually receives a fixed number of arguments, and there is no distinction between positional and keyword.

This function always receives 3 arguments no matter how it is called:

def foo(a=1, b=2, c=3): pass

This function receives exactly two arguments, p and q:

def bar(*p, **q): pass

It makes sense therefore, to compile the code for these functions to accept exactly the expected number of arguments, and leave it to the function object to perform the require mapping from the arguments actually provided by the caller.

If PyFunction actually implements several overloads, then we can minimise the work done on a function call:

foo(3)  # Python

expands to the equivalent of:

foo.__call__(3)

the PyFunction3 class implements this as:

PyObject __call__(PyObject arg1)
{
    return this.__call__(arg1, defaults[1], defaults[2])
}

PyObject __call__(PyObject arg1, PyObject arg2, PyObject arg3)
{
    return this.fDelegate(arg1, arg2, arg3)
}

and the function foo was actually compiled as:

static PyObject foo(PyObject a, PyObject b, PyObject c)
{
    return PyNone;
}

We can handle bound methods in exactly the same way, using a PyBoundMethod class instead of a PyFunction class. However, there is no way to use delegates to bind an instance to a method at runtime without using the reflection classes, so we need another mechanism for unbound methods.

The only answer is to use automatically generated static methods.

class X
{
    int foo() { ... }
    void bar(int iVal) { ... }
}

At runtime, when we first try to use an instance of an X in Python, we can create an anonymous wrapper class:

class Wrap_X
{
    static PyObject foo(PyObject self)
    {
        return (PyObject)
            ((X)((PyExternalObject)foo).getObject())();
    }
    static PyObject bar(PyObject self, PyObject iVal)
    {
        ((X)((PyExternalObject)foo).getObject())((int) iVal);
        return PyNone;
    }
}

at the same time, the code can create a dictionary mapping the method names 'foo', and 'bar' onto PyUnboundMethod1 and PyUnboundMethod2 instances.

Similar wrapper methods may be created for properties, and factories to wrap the constructors.

2.2 Optimising Globals

Global variables are one area where C-Python's implementation seems to be missing a trick. In C-Python, every reference to a global variable involves a dictionary lookup to find the variable. Every reference to a builtin involves two dictionary lookups: one to check that there is no global variable hiding the builtin, and then one to find the builtin.

Guido van Rossum has already stated that he would like to optimise some references to builtins at compile time, so it may be possible to accept a situation where not everything you can currently do with global variables is legal in the future.

Setting a global should not involve a dictionary lookup

Direct assignments to global variables should not involve a dictionary lookup. We know when we are assigning to a global, we know which global is being assigned, therefore we can simply store the value in a static variable in the module.

Of course, we have to continue to support setting a global from outside the module, and also setting it indirectly through globals()

Accessing a global should not involve a dictionary lookup

As with assignment, if we know the variable we are accessing we can pick it straight up from the static variable. There may be an issue if the global could be unset or masking a builtin of the same name, so we may need more code than a simple load of a variable, but not much more.

globals() dictionary

We have to allow access to the global variables through the dictionary interface, but this doesn't have to be a real dictionary. It could be a subclass of dictionary which directly reads and writes the variables for known globals, and falls back on a dictionary only when accessing variables not referred to directly in the program source.

Optimising builtins

References to builtins may be optimised at compile time, but only when there is no global variable of the same name which could be hiding the builtin. It would seem prudent also, if optimising access to a builtin, to ensure that the globals() dictionary for the module will throw an exception on any attempt to mask the builtin at runtime.

Readonly globals

This is a bit more speculative, and might break too much of Python's dynamic nature for some people, but it would also be possible to optimise access to some global values that are set only once. For example an imported module is rarely reassigned, so (provided there is no obvious reuse of the variable) you could optimise access to the module's attributes and use the same mechanism as for builtins to reject any attempt to modify the variable indirectly.

However, this might be a step too far for some of the Python community. I can think of rare situations where I have resorted to rebinding a name in a different module where the name was set once by an import statement (this was to stub out the module for unit tests), but it isn't generally a good practice.

3 The Red Queen's Race

"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

"I'd rather not try, please!" said Alice. "I'm quite content to stay here--only I am so hot and thirsty!"

Python is a moving target. It will take a lot of running to move from the Activestate implementation to a full Python 2.1 implementation, and then more running to add in list comprehensions, generators, and by the time we get there generator comprehensions and function decorators will be in the language as well.

[1]	I am using the term C-Python here to refer to the specific implementation of Python written in C.

[2]

Actually, .Net does have pointers to functions, but in managed code the only things you can do with them are to create delegates, and then only using two fixed instruction sequences. You can also retrieve a function pointer converted to an integer, or convert an integer function pointer back into a delegate, but only by using the reflection classes and there are no speed benefits over using the classes directly.