Sunday, May 02, 2010

Shedskin versus Psyco for 131 Project Euler Solutions

(Shedskin is an experimental (restricted) Python-to-C++ compiler.)

After releasing 0.4 I received a link to a blog post comparing PyPy, Unladen Swallow and Psyco for a set of over 100 project euler solutions. Of course, I decided to ask the author for the source code of these solutions, to have a go with Shedskin. It turned out to be a great test set for spotting performance flaws, and it even convinced me to add 64-bit integer support (shedskin -l), though I'm still not sure how generally useful that is.

Out of 204 solutions in total, I was able to get 131 to compile and run correctly (using the new 64-bit support), at a total of about 5,000 lines (sloccount). In some cases, I had to work around a Shedskin limitation, such as that mixing None and integers or nested functions are not allowed, but these were all very simple changes. While in many cases, it would have been easy to optimize things for Shedskin, I didn't make a single change other than to get things to compile. The other 73 solutions either needed arbitrary-precision arithmetic or use some unsupported module (mostly 'gmpy', whatever that is).

After optimizing Shedskin here and there (though only in generally useful ways, nothing really specific to these tests), I compared the run-time of Shedskin with that of Psyco on these 131 solutions. The graph below shows the relative speedup of Shedskin versus Psyco, taking the shortest time of three runs for each. Shedskin is faster in 109 cases out of 131, and if we take the cumulative time (730 versus 1330 seconds), 45% faster in total. Many of the 22 cases where Psyco wins deal with dictionaries, and I think Shedskin could also win most of these with a better dict implementation (any volunteers?).



A reason why the results may be somewhat flattering for Shedskin, is that quite a few solutions run in only a fraction of a second, and Psyco may suffer from a longer startup time (though I wouldn't know). On the other hand, many of the solutions are quite memory-intensive and spend a lot of time in the python builtins, so Shedskin often doesn't get a chance to shine. I think with a few manual changes to the solutions, many would run much faster using both Psyco and Shedskin, but with a larger speedup for Shedskin.

Unfortunately, Project Euler being semi-competitive and all, I cannot share the solutions. But I will mail the timings for each solution to the discussion group. Anyway, here's the setup I used to calculate the graph:

- shedskin -l (SVN 2010-5-1)
- g++ 4.4.3 -O3 -fprofile-generate/use -march=native -mtune=native -fomit-frame-pointer
- q9550s CPU
- each solution is run three times with Psyco, then three times with Shedskin, with a time.sleep(2) after each run
- the fastest time for each is taken.

So again, if anyone would be interested in rewriting the dict implementation, based on the CPython code (instead of relaying to the STL), that would be great. FFAO did this before for the set implementation, and that helped a lot.