Author: | Duncan Booth |
---|---|
Contact: | duncan@rcp.co.uk |
Abstract
Sometimes Python on its own isn't quite enough. Maybe you want to access a library available only in C or C++, maybe you want to script an existing C/C++ application, or maybe you just want an easier way to test your C/C++ code.
This paper introduces the Python C API, and also looks at more sophisticated interfaces such as SWIG, SIP and Boost.
This is an overview of some of the different ways that Python code may be linked to C or C++ code.
There are several reasons why one might wish to extend Python in C or C++, such as:
- Calling functions in an existing library.
- Adding a new builtin type to Python
- Optimising inner loops in code
- Exposing a C++ class library to Python
- Embedding Python inside a C/C++ application
This paper sets out to catalogue the options, not to propose a single solution. The interfaces described here have each addressed a different problem, and there is probably no single solution that will fit every requirement.
This C api gives you the maximum flexibility. If you use any of the other techniques described in this paper, you will probably have to learn at least some of the C api as well.
You can also use the C api to embed Python in a C program.
Writing directly to the C api can be complex.
The main examples for the Python C Api are the source of Python itself. The easiest way to add a new type is to take a type of similar complexity and edit the source. Unfortunately the internal types of Python haven't been written with their use as C Api tutorials uppermost in the developers' minds.
This example shows how to call one of the Windows functions that is not wrapped by Mark Hammond's win32 extensions.
This module contains everything we need to call the file version functions. There are two functions to be exposed here:
GetFileVersionInfo()
Reads the version information embedded in the file and returns a data block containing the text.VerQueryValue()
Returns a version information item for the data block returned by the first call and returns it as a unicode string. However, if the name of the version item is VarFileInfoTranslation it returns a list of pairs of numbers instead.
(Don't ask me why a C function can return either a string or a list of numbers, but it does).
The test code shows how we expect the functions to be used. Unfortunately the test is rather system specific. If you try to run these tests on anything other than Windows 2000 the expected results will need editing.
import unittest, sys, os try: import win32ver except ImportError: sys.exit("Import of win32ver failed: use setup.py to install it") # These strings are rather system dependant. EXPECTEDSTRINGS = [ ("OriginalFilename", "REGEDIT.EXE"), ("ProductVersion", "5.00.2134.1"), ("FileVersion", "5.00.2134.1"), ("FileDescription", "Registry Editor"), ("Comments", None), ("InternalName", "REGEDIT"), ("ProductName", "Microsoft(R) Windows (R) 2000 Operating System"), ("CompanyName", "Microsoft Corporation"), ("LegalCopyright", "Copyright (C) Microsoft Corp. 1981-1999"), ("LegalTrademarks", None), ("PrivateBuild", None), ("SpecialBuild", None)] TESTFILE = os.path.join(os.environ['windir'], 'regedit.exe') class Win32VerTest(unittest.TestCase): def setUp(self): self.info = win32ver.GetFileVersionInfo(TESTFILE) assert(self.info is not None) def tearDown(self): pass def testLCList(self): '''Retrive language codepair list''' # Calling VerQueryValue with no path should return a list # of language, codepage pairs. lclist = win32ver.VerQueryValue(self.info) self.assertEquals(lclist, [(1033, 1200)]) def testValues(self): '''Retrieve version strings''' lclist = win32ver.VerQueryValue(self.info) block = u"\\StringFileInfo\\%04x%04x\\" % lclist[0] for s, expected in EXPECTEDSTRINGS: value = win32ver.VerQueryValue(self.info, block+s) self.assertEquals(value, expected) if __name__=='__main__': unittest.main()
Although you can write your own makefile, for a simple C extension it is much easier just to use distutils. A script like the one below will compile and install your extension or build an archive for distribution. setup.py contains:
# This setup.py will compile and install the win32ver extension # Use: # setup.py install # # In Python 2.2 the extension is copied into # <pythonhome>/lib/site-packages # in earlier versions it may be put directly in the python directory. from distutils.core import setup, Extension setup (name = "win32ver", version = "1.0", maintainer = "Duncan Booth", maintainer_email = "duncan@rcp.co.uk", description = "Win32 Version Information", ext_modules = [Extension('win32ver', sources=['win32ver.cpp'], libraries=['version'])] ) # end of file: setup.py
The C (or in this case C++) source file listed in setup.py is compiled and installed by using the command "setup.py install". Other options include "setup.py sdist" to produce a zip file for source distribution, or "setup.py bdist_win32" to generate a windows installer for the binary .pyd file.
The file (win32ver.cpp) starts with header files. In this case we need both windows and Python:
// Simple interface to Microsoft's Win32 File Version functions. // #include <windows.h> #include <malloc.h> #include "Python.h"
I prefer to layout the code with the docstring above the function.
char win32ver_GetFileVersionInfo__doc__[] = "GetFileVersionInfo(filename) -- Read file version information\n" "returns versioninfo as a string";
This is the easy function. It actually uses two windows functions as there is a separate function to find the size of the block to be returned. PyArg_ParseTuple does the hard job of converting the Python arguments into a string, and Py_BuildValue converts the data buffer back into a string to return the result. Many library functions can be wrapped easily with just these two functions.
The other thing we have to do is handle errors. This is done using PyErr_SetString and specifying an exception type and a descriptive string used as the argument to the exception. In this case the existing MemoryError exception seemed suitable. In other situations you might want to create your own exception which you do by calling PyErr_NewException and PyModule_AddObject in the initialisation function at the bottom of the file.
PyObject * win32ver_GetFileVersionInfo(PyObject *self, PyObject *args) { PyObject *result = NULL; char *filename; if (!PyArg_ParseTuple(args, "s", &filename)) return NULL; // read version info here. DWORD handle = 0; DWORD size = GetFileVersionInfoSize(filename, &handle); if (size != 0) { char *buffer = static_cast<char *>(calloc(1, size)); if (buffer == NULL) { PyErr_SetString(PyExc_MemoryError, "win32ver: Out of memory"); return NULL; } if (GetFileVersionInfo(filename, 0, size, buffer) == 0) { free(buffer); PyErr_SetString(PyExc_MemoryError, "win32ver: Cannot read version info"); return NULL; } result = Py_BuildValue("s#", buffer, size); free(buffer); return result; } Py_INCREF(Py_None); return Py_None; }
This function is rather more complex as it has to return different result types according to the argument, and one of those results is a list of varying length.
To return a list we can use Py_BuildValue to create an empty list (or we could use PyList_New for exactly the same effect), and then PyList_Append to add each item to the list. This is exactly the same way as we might build a list in Python (before the days of list comprehensions), and in fact most operations using the C api closely follow the pattern of equivalent Python code.
It is important always to consider whether reference counts are correctly maintained. When we build a list it is created with a reference count of 1. We build a tuple and that also has a reference count of 1. Appending the tuple to the list increments the reference count on the tuple, so we must decrement it before we overwrite the variable next time round the loop. If we encounter an error, then decrementing the reference on the list will automatically release all the tuples that were already created.
char win32ver_VerQueryValue__doc__[] = "VerQueryValue(versioninfo, [subblock]) -- Returns version\n" "information from the block retrieved by GetFileVersionInfo\n" "If subblock is omitted, it returns a list of tuples\n" "(language,codepage)\n"; PyObject * win32ver_VerQueryValue(PyObject *self, PyObject *args) { PyObject *result = NULL; char *buffer; Py_UNICODE *subblock = NULL; unsigned int buflen; if (!PyArg_ParseTuple(args, "s#|u", &buffer, &buflen, &subblock)) return NULL; if (subblock == NULL) { PyObject *res = Py_BuildValue("[]"); // Read language code and codepage. struct LANGANDCODEPAGE { WORD wLanguage; WORD wCodePage; } *lpTranslate; unsigned int cbTranslate; // Read the list of languages and code pages. VerQueryValueA(buffer, TEXT("\\VarFileInfo\\Translation"), (LPVOID*)&lpTranslate, &cbTranslate); for(unsigned int i=0; i < (cbTranslate/sizeof(struct LANGANDCODEPAGE)); i++ ) { PyObject *lc = Py_BuildValue("(ii)", lpTranslate[i].wLanguage, lpTranslate[i].wCodePage); if (lc == NULL || PyList_Append(res, lc) < 0) { // Method failed. Release res. Py_DECREF(res); // Destroys the partly constructed object return NULL; // Append should have raised the error. } Py_DECREF(lc); // release our reference to the tuple. } return res; } else // subblock was specified. { Py_UNICODE *value; unsigned int size; if (VerQueryValueW(buffer, subblock, (LPVOID*)&value, &size) != 0) { PyObject *res = Py_BuildValue("u", value); return res; } } Py_INCREF(Py_None); return Py_None; }
You need to make the functions visible to Python by creating an array of PyMethodDef objects. For each function you specify its name, a pointer to the actual function, the style of argument passing (METH_VARARGS, METH_KEYWORDS, METH_NOARGS), and a docstring. The array is null terminated.
static PyMethodDef win32ver_functions[] = { {"GetFileVersionInfo", (PyCFunction)win32ver_GetFileVersionInfo, METH_VARARGS, win32ver_GetFileVersionInfo__doc__}, {"VerQueryValue", (PyCFunction)win32ver_VerQueryValue, METH_VARARGS, win32ver_VerQueryValue__doc__}, {NULL, NULL, 0, NULL} };
Every C module must have a module initialization function. This has to create a new module object which is usually done by calling Py_InitModule3. You specify an initial list of functions to be included in the module, but if you want to include other objects this has to be done by manually adding them with the PyModule_AddObject function.
/* module entry-point (module-initialization) function */ void initwin32ver(void) { /* Create the module and add the functions */ PyObject *m = Py_InitModule3("win32ver", win32ver_functions, "Win32 Version Info"); }
SWIG bridges C code to Python by means of an interface file. It makes it very easy to wrap simple C apis.
Global functions are wrapped as new Python built-in functions. For example:
%module example int fact(int n);
creates a built-in function example.fact(n) that works exactly like you think it does:
>>> import example >>> print example.fact(4) 24 >>>
Global variables are exposed by SWIG as attributes of an object called cvar. For example:
// SWIG interface file with global variables %module example ... extern int My_variable; extern double density; ...
This can be used in Python like this:
>>> import example >>> # Print out value of a C global variable >>> print example.cvar.My_variable 4 >>> # Set the value of a C global variable >>> example.cvar.density = 0.8442 >>> # Use in a math operation >>> example.cvar.density = example.cvar.density*1.10
SWIG 1.3 can also wrap C structures and C++ classes. It does this by creating a Python proxy stub class and a set of static C functions that perform the mapping.
For example, if you have a class like this:
class Foo { public: int x; int spam(int); ...
then SWIG transforms it into a set of low-level procedural wrappers. For example:
Foo *new_Foo() { return new Foo(); } void delete_Foo(Foo *f) { delete f; } int Foo_x_get(Foo *f) { return f->x; } void Foo_x_set(Foo *f, int value) { f->x = value; } int Foo_spam(Foo *f, int arg1) { return f->spam(arg1); }
These wrappers can be found in the low-level extension module (e.g., _example). Using these wrappers, SWIG generates a high-level Python proxy class like this:
import _example class Foo(object): def __init__(self): self.this = _example.new_Foo() self.thisown = 1 def __del__(self): if self.thisown: _example.delete_Foo(self.this) def spam(self,arg1): return _example.Foo_spam(self.this,arg1) x = property(_example.Foo_x_get, _example.Foo_x_set)
This class merely holds a pointer to the underlying C++ object (.this) and dispatches methods and member variable access to that object using the low-level accessor functions. From a user's point of view, it makes the class work normally:
>>> f = example.Foo() >>> f.x = 3 >>> y = f.spam(5)
The main drawback to SWIG is that as soon as you exceed the simple types understood by the SWIG library for argument passing and results, you start having to write code. For example the SWIG documentation suggests that to wrap a function:
void set_transform(Image *im, double m[4][4]);
you should define additional inline helper functions to create and access the C array:
%inline %{ /* Note: double[4][4] is equivalent to a pointer to an array double (*)[4] */ double (*new_mat44())[4] { return (double (*)[4]) malloc(16*sizeof(double)); } void free_mat44(double (*x)[4]) { free(x); } void mat44_set(double x[4][4], int i, int j, double v) { x[i][j] = v; } double mat44_get(double x[4][4], int i, int j) { return x[i][j]; } %}
My past experience with SWIG has been that I end up having to write a lot of interface code, using a mix of SWIG and the Python C Api. It is also error prone as the responsibility for allocating and freeing the object is now exposed at the Python level, and Python programmers shouldn't have to care about such things.
SWIG documentation suggests that more complex situations should be handled using typemaps. This results in a strange mix of C, Python API and macro expansions. e.g.:
%module outarg // This tells SWIG to treat a double * argument with name // 'OutValue' as an output value. We'll append the value // to the current result which is guaranteed to be a List // object by SWIG. %typemap(argout) double *OutValue { PyObject *o, *o2, *o3; o = PyFloat_FromDouble(*$1); if ((!$result) || ($result == Py_None)) { $result = o; } else { if (!PyTuple_Check($result)) { PyObject *o2 = $result; $result = PyTuple_New(1); PyTuple_SetItem(target,0,o2); } o3 = PyTuple_New(1); PyTuple_SetItem(o3,0,o); o2 = $result; $result = PySequence_Concat(o2,o3); Py_DECREF(o2); Py_DECREF(o3); } } int spam(double a, double b, double *OutValue, double *OutValue);
I would recommend SWIG for simple things, but once I see SWIG code approaching the above I would simply write the C Api code myself. In this example the spam function returns an int and two OutValues. This means the typemap code is expanded twice, and you probably have to look at the expanded code at least once while figuring out why it doesn't work. Compare this with a single call to Py_BuildValue.
return Py_BuildValue("(idd)", result, out1, out2);
One advantage that SWIG may have in certain situations is that it supports languages other than Python. If your interface is sufficiently simple, then once you have wrapped a library for use by Python it is also usable by Perl and Tcl.
SIP is a tool used to generate C++ interface code to Python. It was used to build PyQT (the Python bindings for the Qt toolkit) and PyKDE.
It doesn't appear to be specifically tied to Qt, but doesn't seem to have gained general acceptance with the rest of the Python Community.
An example of SIP code is given here, taken from a tutorial written by Donovan Rebbecchi [DR]:
%Module String %HeaderCode #include <string> %End namespace std { class string { public: string(); string(const char*); string(const std::string&); bool empty(); int length(); int size(); void resize(int, char); void resize(int); int capacity(); void reserve(int =0); std::string& append(const std::string&); std::string& append(const std::string&, int, int); std::string& append(const char*); std::string& append(const char*, int); std::string& append(int, char); std::string& insert(int, const std::string&); std::string& insert(int, const std::string&, int, int); std::string& insert(int, const char*); std::string& insert(int, const char*, int); std::string& insert(int, int, char); int find(const std::string&, int = 0); int find(const char*, int, int); int find(const char*, int = 0); int find(char, int = 0); int rfind(const std::string&, int = -1); int rfind(const char*, int, int); int rfind(const char*, int = -1); int rfind(char, int = -1); int find_first_of(const std::string&, int = 0); int find_first_of(const char*, int, int); int find_first_of(const char*, int = 0); int find_first_of(char, int = 0); int find_first_not_of(const std::string&, int = 0); int find_first_not_of(const char*, int, int); int find_first_not_of(const char*, int = 0); int find_first_not_of(char, int = 0); int find_last_of(const std::string&, int = 0); int find_last_of(const char*, int, int); int find_last_of(const char*, int = 0); int find_last_of(char, int = 0); int find_last_not_of(const std::string&, int = 0); int find_last_not_of(const char*, int, int); int find_last_not_of(const char*, int = 0); int find_last_not_of(char, int = 0); std::string substr(int = 0, int = -1); PyMethod __str__ %MemberCode const char* s; std::string* ptr; ptr = (std::string*)sipGetCppPtr((sipThisType *) a0, sipClass_std_string ); if (ptr == NULL) return NULL; s = ptr->c_str(); /* Python API reference, P40 */ return PyString_FromString(s); %End PySequenceMethod __getitem__ %MemberCode std::string* ptr; ptr = (std::string*)sipGetCppPtr((sipThisType *) a0, sipClass_std_string ); if (ptr == NULL) return NULL; if (a1 >= ptr->length()) { /* Python API Reference, Chapter 4 */ PyErr_SetString (PyExc_IndexError, "string index out of range"); return NULL; } else /* Python API reference, P23 Extending and Embedding the Python Interpreter, 1.3-7 P8-11 */ return Py_BuildValue("c", ptr->at(a1)); %End }; /* class string */ }; /*namespace std */
The Boost Python library is probably the richest of the wrapper interface available. It is designed to wrap C++ interfaces as unobtrusively as possible. It includes support for:
- References and Pointers
- Globally Registered Type Coercions
- Automatic Cross-Module Type Conversions
- Efficient Function Overloading
- C++ to Python Exception Translation
- Default Arguments
- Keyword Arguments
- Manipulating Python objects in C++
- Exporting C++ Iterators as Python Iterators
- Documentation Strings
Here is a simple example of Boost taken from its tutorial:
Say we have the following class we want to use from Python:
struct World { void set(std::string msg) { this->msg = msg; } std::string greet() { return msg; } std::string msg; };We can expose this to Python by writing a corresponding Boost.Python C++ Wrapper:
#include <boost/python.hpp> using namespace boost::python; BOOST_PYTHON_MODULE(hello) { class_<World>("World") .def("greet", &World::greet) .def("set", &World::set) ; }The wrapper is written using C++, however the extensive use of macros and templates means this may not be immediately obvious to the uniniated reader.
This exposes the member functions greet and set, but msg isn't accessible to Python, even though it is public within C++. After building our module, we may use our class World in Python. Here's a sample Python session:
>>> import hello >>> planet = hello.World() >>> planet.set('howdy') >>> planet.greet() 'howdy'
Weave takes a totally different approach from the systems described above. Instead of letting you build a library in C or C++ which you then import as a Python module, Weave lets you put C code directly inline in your Python program. For example:
def c_int_binary_search(seq,t): # do a little type checking in Python assert(type(t) == type(1)) assert(type(seq) == type([])) # now the C code code = """ #line 29 "binary_search.py" int val, m, min = 0; int max = seq.length() - 1; PyObject *py_val; for(;;) { if (max < min ) { return_val = Py::new_reference_to(Py::Int(-1)); break; } m = (min + max) /2; val = py_to_int(PyList_GetItem(seq.ptr(),m),"val"); if (val < t) min = m + 1; else if (val > t) max = m - 1; else { return_val = Py::new_reference_to(Py::Int(m)); break; } } """ return inline(code,['seq','t'])
If all you want to do is use C to speed up some inner loop calculations then Weave may be all you need. Another package that does a similar job is [PyInline].
Pyrex takes yet another approach. It is a language that uses Python style syntax, C types, and compiles into a C extension module for Python. If you want to code directly to the C api, then you often find yourself mentally working out how you would write something in Python and translating that into a sequence of api function calls; Pyrex does that for you automatically.
Pyrex will translate almost any Python code into C code which makes the equivalent set of calls to the C api. It also lets you declare parameters and variables to have C data types. It provides automatic conversions from the Python types to C types.
A Pyrex example:
def primes(int kmax): cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result
This is how you define a new C coded builtin type using Pyrex:
cdef class Spam: cdef int amount def __new__(self): self.amount = 0 def get_amount(self): return self.amount def set_amount(self, new_amount): self.amount = new_amount def describe(self): print self.amount, "tons of spam!"
Although Pyrex was initially aimed at simply speeding up Python inner loops, it does allow you to call external C functions from the Pyrex code. This means that you can use it to write wrappers and gain easy access to third party functions. There is a drawback though as C header files cannot be used directly, so the definitions of types and functions contained in header files have to be translated into a form usable by Pyrex.
Strengths | Weaknesses | |
---|---|---|
C api | Wrapping C Functions Embedding Optimising time critical code Flexibility |
Complex Poor support for C++ |
SWIG | Ease of use Wrapping C Functions Common interface with Perl and Tcl/Tk |
Poor support for C++ Non-trivial mappings need to use C api |
SIP | Wrap C++ classes | Little used outside PyQT |
Boost | Excellent support for C++ | |
Weave | Optimising time critical code | |
Pyrex | Optimising time critical code Wrap C functions Write extension classes Close to Python syntax |
Not yet a mature product |
[SWIG] |
[SIP] |
[DR] | Programming With Sip -- Some Examples; Donovan Rebbechi; http://www.pegasus.rutgers.edu/~elflord/unix/siptute/ |
[BOOST] | Boost.Python; Dave Abrahams; http://www.boost.org/libs/python/doc/ |
[SciPy] | Weave; http://scipy.org |
[PyInline] | http://pyinline.sourceforge.net/ |
[Pyrex] | Pyrex home page; http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ |