Friday, June 29, 2007

Shed Skin 0.0.22

after being bogged down with work for a few weeks, I got back to development again. the trigger was a proof-of-concept patch sent in by Harri Pasanen for generating extension modules. previously I was waiting for someone else to fully tackle this, but his patch was quite simple, and I found out it easily works under mingw, too. so I generalised things a bit, and released it as part of version 0.0.22. the process of building an extension module is now quite simply 'ss -e ..' and 'make' (note it doesn't work under OSX, yet).

there are some limitations though as to the way this works:

-only builtin scalar and containers can be passed/returned (int, float, str, list, tuple, dict, set)
-arguments/returned objects are completely copied/converted at call/return time (i.e., including contained objects)
-global variables are considered constant, and converted at module initialization time

consider this simple program, mod_name.py:

some_var = [1,2,3]

def some_func(x):
return 2*x

if __name__ == '__main__':
some_func(1) # obviously, this is needed for type inference to work

to compile this down to an extension module, simply use the new '-e' command-line parameter:

ss -e mod_name
make

that's it. now the program can be used from an arbitrary python program/prompt:

>>>import mod_name
>>>dir(mod_name)
>>>mod_name.some_var
>>>mod_name.some_func(1)

I'm hoping someone else can add support for custom classes and find out how to get this to work under OSX!

11 comments:

garyrob said...

This is great news, although I'd like to see the solution on OS X.

One question. One advantage of writing a C extension is that it allows you to release the GIL, do some cpu-intensive work on multiple cores, then grab the GIL and return to normal python which can only use one core.

If Shed Skin could allow that to happen, it would be a substantial reason to use it instead of, say, Pyrex for python extensions (last I checked Pyrex doesn't let you release the GIL).

Is that a possibility?

James said...

Fantastic!!!! This is very useful!!!

Fuzzyman said...

Hey - this is *great* (although I'd like to hear the answer about releasing the GIL).

One nitpick - don't forget to include a link in comp.lang.python.announce posts... :-)

Anyway - congratulations, and thank you. With this and CarbonPython, Python is looking faster by the minute...

Luis said...

I love it! This is absolutely fantastic!!!
I can't belive I can write extension modules that easily...
Keep up the good work and thank you!!!

srepmub said...

@garyrob:

thanks for this insight.

I guess if it's possible to release the GIL for other C extensions, why not for Shedskin-generated extensions.

I'd be very interested in people experimenting with this.

garyrob said...

My understanding is that when sets, etc. are passed into an ss extension, they are converted to other, ss-native structure, so the original Python containers are never modified in the extension. So, there is no way the python objects would be endangered by the extension.

Is that right?

But I'd guess the big issue in using ss without the GIL would be that ss extensions aren't designed to be reentrant, and so if two threads executed the same ss extension at once they would interfere with each other... is that right? Even so, it would be good to be able to run the extension in a thread while other Python threads are doing other things.

srepmub said...

yes, currently everything is converted, so the only problem should be reentrancy in the extension itself.

so I guess if there are no global datastructures that are modified, nothing is shared and everything should be reentrant?

srepmub said...

btw, about OSX:

it's probably quite easy to get this to work, but I don't have OSX to play with.

garyrob said...

"so I guess if there are no global datastructures that are modified," I think that's right, but I can't guarantee we're not missing something! ;)

usagi said...

Hi Mark,

Congratulations on this new release with such an important feature. I would like to point you to a blog post of mine comparing set operations in different implementations of Python:

http://pyinsci.blogspot.com/2007/08/set-implementation-performance.html

I found out that the C++ generated from my test script was slower than the pure python implementation! You may want to take a look at your set implementation.

You should add this blog to the Planet Python blog aggregator!

srepmub said...

thanks flavio,

after adding some simple (long overdue) optimizations, your test script runs about 40% faster here (mac mini intel core duo). could you please run your test script again using Shedskin CVS..?

I'll look closer at the remaining speed difference later (it's still about 3 times slower than CPython here), but I think it's because hash_set does not have union etc. methods.. not sure I can do much about that, without completely replacing std::hash_set..

does anyone know if std::unordered_set in future C++ versions will support union and such?