Thursday, October 18, 2007

Shed Skin and Parallel Python

Shed Skin is an experimental Python to C++ compiler. Parallel Python allows for clean and simple parallelization of Python processes over multiple cores. Wouldn't it be cool if we could combine the two? Of course I wouldn't be writing this if I hadn't tried. Here's how to do it.

Create an extension module with Shed Skin, containing function(s) you'd like to use in parallel. For example, we might use the partial sum function from the Parallel Python website:

def part_sum(start, end):
..
return sum

For Shed Skin's type inference to work, part_sum must be called from somewhere:

if __name__ == '__main__':
part_sum(1, 1)

Creating an extension module is simple (suppose the module is named meuk.py):

ss -e meuk
make

Because Parallel Python expects pure-Python code, we must call our compiled function via a pure-Python wrapper:

def part_sum(start, end):
import meuk
return meuk.part_sum(start, end)

In order for Parallel Python to find our extension module (at least on my Ubuntu system), we must issue this in advance:

export PYTHONPATH=$pwd

And there you have it. Here are some timings:

no extension module, 1 worker: 11.3 seconds
no extension module, 2 workers: 6.2 seconds
extension module, 1 worker: 0.6 seconds
extension module, 2 workers: 0.3 seconds

3 comments:

Anonymous said...

Just want to comment that since Parallel Python 1.5 (stable) local directory is included in module search path and "export PYTHONPATH=$pwd" is not required.

Regards,
Vitalii

srepmub said...

hello vitalii,

thanks for letting me know! I will update the documentation.


mark.

Unknown said...

I just love when you have a question and it gets answered like this. The perfect post.

Thanks for the help. Just started playing with shedskin and this was critical.