======================================== Scripting C with Python ======================================== :Author: Duncan Booth :Contact: duncan.booth@suttoncourtenay.org.uk :Abstract: How to benefit from Python even when developing in C .. meta:: :description: Scripting C with Python .. sectnum:: :depth: 2 .. contents:: Why mix C and Python? ===================== The title of this paper could mean many things to different people. It could be that you want to add a scripting language into an existing C program, or it might be that you are developing some new code in C or C++ and you want to take advantages of Python to speed up the development of the non-speed critical parts of the code. One of the real opportunities to get benefit from mixing C and Python comes when you do test driven development. The dynamic nature of Python makes it much easier to write test scripts that will exercise your C code. If it also makes it easier to use your C code from the Python world at a later date that can only be an advantage. When should you mix them? ------------------------- It varies, but given a choice I would say you start using Python a week or a month before you cut any C code. Even if your final target is a standalone C library, writing it first in Python is a great way to get algorithms and code structures clea. Once you have a clean Python version of the code you can write an equivalent C version and avoid many of the usual pitfalls. Sadly, real life isn't like that, so you probably end up linking to an existing codebase where earlier (possibly dubious) design decisions will make your life harder. Embedding or Extending ---------------------- There are two fundamental ways to structure a program that uses both C and Python: Embedding Python may be embedded in a C program. This may be used to add scripting support to an existing program. Embedding is comparatively simple (just initialise the interpreter, and call some Python), but almost always involves Extending as well: after all, why embed an interpreter in your program if you can't interact with the program from inside the interpreter. Extending This extends the Python interpreter by adding a module written in C but callable from Python. Typically you do this because a C library exists that does something which Python doesn't; because you need something to run faster than you can manage in Python; or because you are embedding Python in your application and want to give the Python code access to the program's data and operations. For an example of a program that both embeds and extends Python, windows users might like to look at JASC Paintshop Pro version 8. C Api or toolkit (or both) ========================== So you want to call C from Python? In no particular order: 1) Is there a pre-existing library that does the job? Don't reinvent the wheel. 1) Do you just want to call some C functions from a dynamic linked library? Look at `ctypes`, which is a Python package to create and manipulate C data types in Python, and to call functions in dynamic link libraries/shared dlls. It allows wrapping these libraries in pure Python. It works on Windows, Linux and MacOS X and other systems. The latter require that your machine is supported by libffi. 2) Do you prefer writing in C or in Python. If the latter, then look at `Pyrex`. Look at `Pyrex` even if you prefer C, it will make your life easier. 3) Do you need support for other languages, such as Perl and Tcl. If so, look at `SWIG`. I prefer Pyrex, but not everyone does. 4) Do you need to link to C++? Then `Boost.Python` is the tool to investigate. .. [ctypes] http://starship.python.net/crew/theller/ctypes/ .. [SWIG] http://www.swig.org .. [BOOST] Boost.Python; Dave Abrahams; http://www.boost.org/libs/python/doc/ .. [Pyrex] Pyrex home page; http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ ctypes Example -------------- I think it is worth giving an example of `ctypes` here, and in the spirit of not reinventing any wheels this example was taken from the ctypes wiki and is written by Jack Trainor. It shows how to use ctypes to access the Windows clipboard.:: from ctypes import * from win32con import CF_TEXT, GHND OpenClipboard = windll.user32.OpenClipboard EmptyClipboard = windll.user32.EmptyClipboard GetClipboardData = windll.user32.GetClipboardData SetClipboardData = windll.user32.SetClipboardData CloseClipboard = windll.user32.CloseClipboard GlobalLock = windll.kernel32.GlobalLock GlobalAlloc = windll.kernel32.GlobalAlloc GlobalUnlock = windll.kernel32.GlobalUnlock memcpy = cdll.msvcrt.memcpy def GetClipboardText(): text = "" if OpenClipboard(c_int(0)): hClipMem = GetClipboardData(c_int(CF_TEXT)) GlobalLock.restype = c_char_p text = GlobalLock(c_int(hClipMem)) GlobalUnlock(c_int(hClipMem)) CloseClipboard() return text def SetClipboardText(text): buffer = c_buffer(text) bufferSize = sizeof(buffer) hGlobalMem = GlobalAlloc(c_int(GHND), c_int(bufferSize)) GlobalLock.restype = c_void_p lpGlobalMem = GlobalLock(c_int(hGlobalMem)) memcpy(lpGlobalMem, addressof(buffer), c_int(bufferSize)) GlobalUnlock(c_int(hGlobalMem)) if OpenClipboard(0): EmptyClipboard() SetClipboardData(c_int(CF_TEXT), c_int(hGlobalMem)) CloseClipboard() if __name__ == '__main__': print GetClipboardText() # display last text clipped SetClipboardText("[Clipboard text replaced]") # replace it print GetClipboardText() # display new clipboard Using ctypes lets you call C functions (at least those exported from a DLL) and also manipulate C data types directly from Python. The danger of course, in common with any linking to the C world, it that you can potentially corrupt memory if you get your calls wrong. Pyrex example ============= Here is the same code as the ctypes example, but written using Pyrex. The most obvious thing is that they both look pretty similar. The Pyrex code needs an extra compilation step, so I created a 'setup.py' file and use 'distutils' to do the compilation. I also extracted the test to a separate file, partly because I always have unit tests in a separate file, but also because Pyrex programs cannot be run directly: they must always have either Python or C code to act as the main program. File setup.py:: # Build using the command: # setup.py build_ext --inplace # from distutils.core import setup from distutils.extension import Extension from Pyrex.Distutils import build_ext setup( name = 'clipboard', ext_modules=[ Extension("clipboard", ["clipboard.pyx"], libraries=["USER32"]), ], cmdclass = {'build_ext': build_ext} ) Unit test in 'test.py':: import unittest, sys try: import clipboard except ImportError: sys.exit("Import of clipboard failed: use 'setup.py build_ext --inplace' to install it") class ClipboardTest(unittest.TestCase): def testSetThenGet(self): text = 'testing 1, 2, 3' clipboard.SetClipboardText(text) self.assertEquals(text, clipboard.GetClipboardText()) clipboard.SetClipboardText('') self.assertNotEquals(text, clipboard.GetClipboardText()) if __name__=='__main__': unittest.main() and finally, clipboard.pyx which compiles to clipboard.pyd:: ctypedef unsigned long DWORD ctypedef unsigned int UINT ctypedef void *HANDLE ctypedef HANDLE HWND cdef extern from "memory.h": void *memcpy(void *, void *, DWORD) cdef extern from "windows.h": int OpenClipboard(HWND hWndNewOwner) int EmptyClipboard() HANDLE GetClipboardData(UINT uFormat) HANDLE SetClipboardData(UINT uFormat, HANDLE hMem) int CloseClipboard() void *GlobalLock(HANDLE hMem) HANDLE GlobalAlloc(UINT uFlags, DWORD dwBytes) int GlobalUnlock(HANDLE hMem) from win32con import CF_TEXT, GHND def GetClipboardText(): cdef HANDLE hClipMem cdef char *p text = "" if OpenClipboard(0): hClipMem = GetClipboardData(CF_TEXT) p = GlobalLock(hClipMem); if p: text = p GlobalUnlock(hClipMem); CloseClipboard() return text def SetClipboardText(text): return _SetClipboardText(text, len(text)) cdef _SetClipboardText(char *text, int textlen): cdef HANDLE hGlobalMem cdef void *lpGlobalMem hGlobalMem = GlobalAlloc(GHND, textlen+1) lpGlobalMem = GlobalLock(hGlobalMem) memcpy(lpGlobalMem, text, textlen+1) GlobalUnlock(hGlobalMem) if OpenClipboard(0): EmptyClipboard() SetClipboardData(CF_TEXT, hGlobalMem) CloseClipboard() The definition block could be extracted to a Pyrex header. Sadly Pyrex cannot yet cope with C header files automatically so you do have to rewrite the function prototypes. I split the SetClipboardText function into two to show how you can define a function (the cdef one) which is not visible to Python. '_SetClipboardText' could be called from a pure C source file, but it cannot be called from Python. I could instead have declared:: def SetClipboard(char *text): ... This would have been callable from Python, and would have automatically converted the string to a char*, but I would then have had to either call strlen to find the length, or I could have used len(text) but that would have involved an extra conversion back to a Python object. Errors & Exceptions =================== Whatever mechanism you use to link to your C code, you should convert any errors returned into Python exceptions. Similarly if linking to C++ code you need to convert C++ exceptions to Python exceptions (and of course the reverse applies if you return from Python back to C/C++.) From 'ctypes' or 'Pyrex' you can just throw a Python exception. When you detect that a function you have called has raised an exception, you should clean up any temporary objects you have created and then propogate the error by returning an error code. Obviously if you are running code which was not originally called from Python, then at some level you will have to handle the error in C code, but normally if your C code was called from Python you simply let errors propogate upwards until they are handled in the interpreter. If you want to mark an error as handled, you must call `PyErr_Clear()`. One of the pitfalls that people tend to fall into is failing to check an error code on a Python API function. If you don't check for an error, and instead let your function run to completion and return a non-error result then the error will be caught the next time the interpreter checks for an error. This could be somewhere totally unrelated to the original problem. So: always check return codes. Pyrex makes this easy, you can declare in the function prototype if the C function is capable of setting a Python exception and Pyrex will then propogate it automatically. Memory allocation ================= Everything in Python is an object. This means that a lot of the work in any C code accessing Python will inevitable involve allocating and releasing objects. This is an area where it is easy to make mistakes, and is one of the reasons that toolkits such as Pyrex are so highly recommended: they give you access to C code, but ensure that the memory allocation and reference counting is handled automatically. When you are passed an object that you want to use within your C code, you can use it safely, provided that you don't hold onto the object after your function returns, *and* provided you don't call any C api functions that could potentially release the object. Objects which are passed as parameters to your code are safe, they cannot be free until your function returns. Objects which you access by any other means might not be safe. For example, say your function is passed a list of objects. If you call back into Python code (which can happen without your realising if, for example you release a class object which has a destructor, than the list could be mutated and all the objects it contained might be freed. Borrowing a reference, i.e. using an object without first incrementing its reference count because you 'know it has to be safe', is very dangerous. If in doubt, increment the reference count on every object before you use it, and decrement the reference count only when you have finished. Every C/C++ programmer feels duty bound to implement their own scheme for memory management. At the first ACCU conference I attended (4 years ago) I was amazed by the time and effort expended by the top C++ developers on the question of memory management: how to shave a few clock cycles off each allocation; how to ensure that memory isn't leaked. This preoccupation is a major headache when bridging from Python to existing C code. Consider 'subversion', an open source version control system which is rapidly gaining ground. It has good support for Python scripting with a wrapper generated with SWIG, but the job cannot have been made easier by a peculiar memory api which requires that you first create your own sub division of the heap, then call the api functions, then release all memory allocated by those functions in a single call. This means subversion C code never has to worry about releasing memory it has allocated, but how do you reconcile a scheme like this with Python's reference counting? The simplest model to wrap is one where the caller preallocates memory for a function result. In this case you simply allocate the memory directly in the Python object, or save a pointer to it, and release it when the Python object is destroyed. Any other scheme where you can explicitly release memory blocks can be implemented in a similar manner. If the library insists on doing its own memory management, you may have to copy the data from C allocated blocks of indeterminate lifetime, or make your Python objects hold pointers to C datastructures and set the pointers to NULL when the data no longer exists. This requires extra checks whenever the data is to be accessed. A library that makes regular callbacks with pointers to C data structures can have very complex requirements when mapping to Python. The WxPython wrapper around WxWindows creates a Python wrapper object for each C object that is to be exposed to Python, but since the user can set Python attributes directly on these objects it has to use a weak-reference scheme to ensure that every time it wraps something like a window, it uses the same Python wrapper. The final option is simply to expose the C pointers directly to your Python code. This is what happens if you use either SWIG or ctypes. If you do this then try to isolate the C pointers in one small area of the code. Usually Python can guarantee not to crash the process, and you want to minimise opportunities for that promise to be broken. Threading ========= Right up there with memory allocation for the headaches it can cause, is linking to multi-threaded C libraries. Python maintains a single global mutex (the 'GIL') in the interpreter. Any interpreted code must hold the mutex before it can run. When you call C code, you can choose whether to release the GIL, allowing other Python threads to run until you reclaim it, or to continue to hold the GIL. Normally the best choice is based on how long the C code takes to run, but if the C code can generate an asynchronous callback on a different thread you may be forced to release the GIL. After reclaiming the GIL you can access Python data structures again, but be aware that they may have changed during the time when your code did not hold the GIL, so don't depend on cached values, and don't try to borrow references if you plan to release the GIL. Callback functions should always claim the GIL on entry, and release it on exit. The catch is that you need a thread specific structure to do this, and if the C library can create its own threads this may not be easy to obtain. Conclusion ========== Calling C code from Python is easy.