========================================
Scripting C with Python
========================================

:Author: Duncan Booth
:Contact: duncan.booth@suttoncourtenay.org.uk
:Abstract:
        How to benefit from Python even when developing in C

.. meta::
        :description: Scripting C with Python

.. sectnum::    :depth: 2

.. contents::

Why mix C and Python?
=====================

The title of this paper could mean many things to different people.
It could be that you want to add a scripting language into an existing
C program, or it might be that you are developing some new code in C
or C++ and you want to take advantages of Python to speed up the
development of the non-speed critical parts of the code.

One of the real opportunities to get benefit from mixing C and Python
comes when you do test driven development. The dynamic nature of
Python makes it much easier to write test scripts that will exercise
your C code. If it also makes it easier to use your C code from the
Python world at a later date that can only be an advantage.

When should you mix them?
-------------------------

It varies, but given a choice I would say you start using Python a
week or a month before you cut any C code. Even if your final target
is a standalone C library, writing it first in Python is a great way
to get algorithms and code structures clea. Once you have a clean
Python version of the code you can write an equivalent C version and
avoid many of the usual pitfalls.

Sadly, real life isn't like that, so you probably end up linking to an
existing codebase where earlier (possibly dubious) design decisions
will make your life harder.

Embedding or Extending
----------------------

There are two fundamental ways to structure a program that uses both C
and Python:

Embedding

        Python may be embedded in a C program. This may be used to add
        scripting support to an existing program. Embedding is
        comparatively simple (just initialise the interpreter, and
        call some Python), but almost always involves Extending as
        well: after all, why embed an interpreter in your program if
        you can't interact with the program from inside the
        interpreter.

Extending

        This extends the Python interpreter by adding a module written
        in C but callable from Python. Typically you do this because a
        C library exists that does something which Python doesn't;
        because you need something to run faster than you can manage
        in Python; or because you are embedding Python in your
        application and want to give the Python code access to the
        program's data and operations.

For an example of a program that both embeds and extends Python,
windows users might like to look at JASC Paintshop Pro version 8.

C Api or toolkit (or both)
==========================

So you want to call C from Python? In no particular order:

1) Is there a pre-existing library that does the job? Don't reinvent
   the wheel.

1) Do you just want to call some C functions from a dynamic linked
   library? Look at `ctypes`, which
   is a Python package to create and manipulate C data types in Python,
   and to call functions in dynamic link libraries/shared dlls. It allows
   wrapping these libraries in pure Python.

   It works on Windows, Linux and MacOS X and other systems. The latter
   require that your machine is supported by libffi.

2) Do you prefer writing in C or in Python. If the latter, then look
   at `Pyrex`. Look at `Pyrex` even if you prefer C, it will make your life
   easier.

3) Do you need support for other languages, such as Perl and Tcl. If
   so, look at `SWIG`. I prefer Pyrex, but not everyone does.

4) Do you need to link to C++? Then `Boost.Python` is the tool to
   investigate.

.. [ctypes] http://starship.python.net/crew/theller/ctypes/

.. [SWIG] http://www.swig.org

.. [BOOST] Boost.Python; Dave Abrahams;
        http://www.boost.org/libs/python/doc/

.. [Pyrex] Pyrex home page;
        http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/

ctypes Example
--------------

I think it is worth giving an example of `ctypes` here, and in the
spirit of not reinventing any wheels this example was taken from the
ctypes wiki and is written by Jack Trainor. It shows how to use ctypes
to access the Windows clipboard.::

    from ctypes import *
    from win32con import CF_TEXT, GHND

    OpenClipboard = windll.user32.OpenClipboard
    EmptyClipboard = windll.user32.EmptyClipboard
    GetClipboardData = windll.user32.GetClipboardData
    SetClipboardData = windll.user32.SetClipboardData
    CloseClipboard = windll.user32.CloseClipboard
    GlobalLock = windll.kernel32.GlobalLock
    GlobalAlloc = windll.kernel32.GlobalAlloc
    GlobalUnlock = windll.kernel32.GlobalUnlock
    memcpy = cdll.msvcrt.memcpy

    def GetClipboardText():
         text = ""
         if OpenClipboard(c_int(0)):
             hClipMem = GetClipboardData(c_int(CF_TEXT))
             GlobalLock.restype = c_char_p
             text = GlobalLock(c_int(hClipMem))
             GlobalUnlock(c_int(hClipMem))
             CloseClipboard()
         return text

    def SetClipboardText(text):
         buffer = c_buffer(text)
         bufferSize = sizeof(buffer)
         hGlobalMem = GlobalAlloc(c_int(GHND), c_int(bufferSize))
         GlobalLock.restype = c_void_p
         lpGlobalMem = GlobalLock(c_int(hGlobalMem))
         memcpy(lpGlobalMem, addressof(buffer), c_int(bufferSize))
         GlobalUnlock(c_int(hGlobalMem))
         if OpenClipboard(0):
             EmptyClipboard()
             SetClipboardData(c_int(CF_TEXT), c_int(hGlobalMem))
             CloseClipboard()

    if __name__ == '__main__':
         print GetClipboardText()                            # display last text clipped
         SetClipboardText("[Clipboard text replaced]")       # replace it
         print GetClipboardText()                            # display new clipboard

Using ctypes lets you call C functions (at least those exported from a
DLL) and also manipulate C data types directly from Python. The danger
of course, in common with any linking to the C world, it that you can
potentially corrupt memory if you get your calls wrong.

Pyrex example
=============
Here is the same code as the ctypes example, but written using Pyrex.
The most obvious thing is that they both look pretty similar. The
Pyrex code needs an extra compilation step, so I created a 'setup.py'
file and use 'distutils' to do the compilation. I also extracted the
test to a separate file, partly because I always have unit tests in a
separate file, but also because Pyrex programs cannot be run directly:
they must always have either Python or C code to act as the main
program.

File setup.py::

    # Build using the command:
    #    setup.py build_ext --inplace
    #
    from distutils.core import setup
    from distutils.extension import Extension
    from Pyrex.Distutils import build_ext

    setup(
      name = 'clipboard',
      ext_modules=[ 
        Extension("clipboard",
            ["clipboard.pyx"], libraries=["USER32"]),
        ],
      cmdclass = {'build_ext': build_ext}
    )

Unit test in 'test.py'::

    import unittest, sys

    try:
        import clipboard
    except ImportError:
        sys.exit("Import of clipboard failed: use 'setup.py build_ext --inplace' to install it")

    class ClipboardTest(unittest.TestCase):
        def testSetThenGet(self):
            text = 'testing 1, 2, 3'
            clipboard.SetClipboardText(text)
            self.assertEquals(text, clipboard.GetClipboardText())
            clipboard.SetClipboardText('')
            self.assertNotEquals(text, clipboard.GetClipboardText())


    if __name__=='__main__':
        unittest.main()

and finally, clipboard.pyx which compiles to clipboard.pyd::

    ctypedef unsigned long DWORD
    ctypedef unsigned int UINT
    ctypedef void *HANDLE
    ctypedef HANDLE HWND

    cdef extern from "memory.h":
        void *memcpy(void *, void *, DWORD)

    cdef extern from "windows.h":
        int OpenClipboard(HWND hWndNewOwner)
        int EmptyClipboard()
        HANDLE GetClipboardData(UINT uFormat)
        HANDLE SetClipboardData(UINT uFormat, HANDLE hMem)
        int CloseClipboard()
        void *GlobalLock(HANDLE hMem)
        HANDLE GlobalAlloc(UINT uFlags, DWORD dwBytes)
        int GlobalUnlock(HANDLE hMem)

    from win32con import CF_TEXT, GHND

    def GetClipboardText():
        cdef HANDLE hClipMem
        cdef char *p
        text = ""
        if OpenClipboard(<HWND>0):
            hClipMem = GetClipboardData(CF_TEXT)
            p = <char *>GlobalLock(hClipMem);
            if p:
                text = p
            GlobalUnlock(hClipMem);
            CloseClipboard()
        return text

    def SetClipboardText(text):
        return _SetClipboardText(text, len(text))

    cdef _SetClipboardText(char *text, int textlen):
        cdef HANDLE hGlobalMem
        cdef void *lpGlobalMem

        hGlobalMem = GlobalAlloc(GHND, textlen+1)
        lpGlobalMem = GlobalLock(hGlobalMem)
        memcpy(lpGlobalMem, text, textlen+1)
        GlobalUnlock(hGlobalMem)
        if OpenClipboard(<HWND>0):
            EmptyClipboard()
            SetClipboardData(CF_TEXT, hGlobalMem)
            CloseClipboard()

The definition block could be extracted to a Pyrex header. Sadly Pyrex
cannot yet cope with C header files automatically so you do have to
rewrite the function prototypes.

I split the SetClipboardText function into two to show how you can
define a function (the cdef one) which is not visible to Python.
'_SetClipboardText' could be called from a pure C source file, but it
cannot be called from Python. I could instead have declared::

    def SetClipboard(char *text):
        ...

This would have been callable from Python, and would have
automatically converted the string to a char*, but I would then have
had to either call strlen to find the length, or I could have used
len(text) but that would have involved an extra conversion back to a
Python object.

Errors & Exceptions
===================

Whatever mechanism you use to link to your C code, you should convert
any errors returned into Python exceptions. Similarly if linking to
C++ code you need to convert C++ exceptions to Python exceptions (and
of course the reverse applies if you return from Python back to
C/C++.)

From 'ctypes' or 'Pyrex' you can just throw a Python exception.

When you detect that a function you have called has raised an
exception, you should clean up any temporary objects you have created
and then propogate the error by returning an error code. Obviously if
you are running code which was not originally called from Python, then
at some level you will have to handle the error in C code, but
normally if your C code was called from Python you simply let errors
propogate upwards until they are handled in the interpreter.

If you want to mark an error as handled, you must call
`PyErr_Clear()`.

One of the pitfalls that people tend to fall into is failing to check
an error code on a Python API function. If you don't check for an
error, and instead let your function run to completion and return a
non-error result then the error will be caught the next time the
interpreter checks for an error. This could be somewhere totally
unrelated to the original problem. So: always check return codes.
Pyrex makes this easy, you can declare in the function prototype if
the C function is capable of setting a Python exception and Pyrex will
then propogate it automatically.

Memory allocation
=================

Everything in Python is an object. This means that a lot of the work
in any C code accessing Python will inevitable involve allocating and
releasing objects. This is an area where it is easy to make mistakes,
and is one of the reasons that toolkits such as Pyrex are so highly
recommended: they give you access to C code, but ensure that the
memory allocation and reference counting is handled automatically.

When you are passed an object that you want to use within your C code,
you can use it safely, provided that you don't hold onto the object
after your function returns, *and* provided you don't call any C api
functions that could potentially release the object.

Objects which are passed as parameters to your code are safe, they
cannot be free until your function returns. Objects which you access
by any other means might not be safe. For example, say your function
is passed a list of objects. If you call back into Python code (which
can happen without your realising if, for example you release a class
object which has a destructor, than the list could be mutated and all
the objects it contained might be freed.

Borrowing a reference, i.e. using an object without first incrementing
its reference count because you 'know it has to be safe', is very
dangerous. If in doubt, increment the reference count on every object
before you use it, and decrement the reference count only when you
have finished.

Every C/C++ programmer feels duty bound to implement their own scheme
for memory management. At the first ACCU conference I attended (4
years ago) I was amazed by the time and effort expended by the top C++
developers on the question of memory management: how to shave a few
clock cycles off each allocation; how to ensure that memory isn't
leaked. This preoccupation is a major headache when bridging from
Python to existing C code.

Consider 'subversion', an open source version control system which is
rapidly gaining ground. It has good support for Python scripting with
a wrapper generated with SWIG, but the job cannot have been made
easier by a peculiar memory api which requires that you first create
your own sub division of the heap, then call the api functions, then
release all memory allocated by those functions in a single call. This
means subversion C code never has to worry about releasing memory it
has allocated, but how do you reconcile a scheme like this with
Python's reference counting?

The simplest model to wrap is one where the caller preallocates memory
for a function result. In this case you simply allocate the memory
directly in the Python object, or save a pointer to it, and release it
when the Python object is destroyed. Any other scheme where you can
explicitly release memory blocks can be implemented in a similar
manner.

If the library insists on doing its own memory management, you may
have to copy the data from C allocated blocks of indeterminate
lifetime, or make your Python objects hold pointers to C
datastructures and set the pointers to NULL when the data no longer
exists. This requires extra checks whenever the data is to be
accessed.

A library that makes regular callbacks with pointers to C data
structures can have very complex requirements when mapping to Python.
The WxPython wrapper around WxWindows creates a Python wrapper object
for each C object that is to be exposed to Python, but since the user
can set Python attributes directly on these objects it has to use a
weak-reference scheme to ensure that every time it wraps something
like a window, it uses the same Python wrapper.

The final option is simply to expose the C pointers directly to your
Python code. This is what happens if you use either SWIG or ctypes.
If you do this then try to isolate the C pointers in one small area of
the code. Usually Python can guarantee not to crash the process, and
you want to minimise opportunities for that promise to be broken.

Threading
=========
Right up there with memory allocation for the headaches it can cause,
is linking to multi-threaded C libraries.

Python maintains a single global mutex (the 'GIL') in the interpreter.
Any interpreted code must hold the mutex before it can run.

When you call C code, you can choose whether to release the GIL,
allowing other Python threads to run until you reclaim it, or to
continue to hold the GIL. Normally the best choice is based on how
long the C code takes to run, but if the C code can generate an
asynchronous callback on a different thread you may be forced to
release the GIL.

After reclaiming the GIL you can access Python data structures again,
but be aware that they may have changed during the time when your code
did not hold the GIL, so don't depend on cached values, and don't try
to borrow references if you plan to release the GIL.

Callback functions should always claim the GIL on entry, and release
it on exit. The catch is that you need a thread specific structure to
do this, and if the C library can create its own threads this may not
be easy to obtain.

Conclusion
==========

Calling C code from Python is easy.