Sunday, May 02, 2010

Shedskin versus Psyco for 131 Project Euler Solutions

(Shedskin is an experimental (restricted) Python-to-C++ compiler.)

After releasing 0.4 I received a link to a blog post comparing PyPy, Unladen Swallow and Psyco for a set of over 100 project euler solutions. Of course, I decided to ask the author for the source code of these solutions, to have a go with Shedskin. It turned out to be a great test set for spotting performance flaws, and it even convinced me to add 64-bit integer support (shedskin -l), though I'm still not sure how generally useful that is.

Out of 204 solutions in total, I was able to get 131 to compile and run correctly (using the new 64-bit support), at a total of about 5,000 lines (sloccount). In some cases, I had to work around a Shedskin limitation, such as that mixing None and integers or nested functions are not allowed, but these were all very simple changes. While in many cases, it would have been easy to optimize things for Shedskin, I didn't make a single change other than to get things to compile. The other 73 solutions either needed arbitrary-precision arithmetic or use some unsupported module (mostly 'gmpy', whatever that is).

After optimizing Shedskin here and there (though only in generally useful ways, nothing really specific to these tests), I compared the run-time of Shedskin with that of Psyco on these 131 solutions. The graph below shows the relative speedup of Shedskin versus Psyco, taking the shortest time of three runs for each. Shedskin is faster in 109 cases out of 131, and if we take the cumulative time (730 versus 1330 seconds), 45% faster in total. Many of the 22 cases where Psyco wins deal with dictionaries, and I think Shedskin could also win most of these with a better dict implementation (any volunteers?).

A reason why the results may be somewhat flattering for Shedskin, is that quite a few solutions run in only a fraction of a second, and Psyco may suffer from a longer startup time (though I wouldn't know). On the other hand, many of the solutions are quite memory-intensive and spend a lot of time in the python builtins, so Shedskin often doesn't get a chance to shine. I think with a few manual changes to the solutions, many would run much faster using both Psyco and Shedskin, but with a larger speedup for Shedskin.

Unfortunately, Project Euler being semi-competitive and all, I cannot share the solutions. But I will mail the timings for each solution to the discussion group. Anyway, here's the setup I used to calculate the graph:

- shedskin -l (SVN 2010-5-1)
- g++ 4.4.3 -O3 -fprofile-generate/use -march=native -mtune=native -fomit-frame-pointer
- q9550s CPU
- each solution is run three times with Psyco, then three times with Shedskin, with a time.sleep(2) after each run
- the fastest time for each is taken.

So again, if anyone would be interested in rewriting the dict implementation, based on the CPython code (instead of relaying to the STL), that would be great. FFAO did this before for the set implementation, and that helped a lot.


Jean-Paul Calderone said...

Hi Mark,

I think you've graphed these results in a slightly misleading way (only "slightly" because I think your results *do* basically support your conclusion). By plotting psyco time over shedskin time, you've maximized the visual impact of the results that are favorable to shedskin and minimized the visual impact of results favorable to psyco. If you plot the reciprocal (shedskin over psyco) I think you'll see what I mean - the resulting graph will be a lot less spectacular.

A better way to plot these results would graph a datapoint where psyco is 2x faster than shedskin and a datapoint where shedskin is 2x faster than psyco at equal distances from a visual baseline, rather than putting the former only half as far from the baseline as the latter as your graph here does.

srepmub said...

thanks. yeah, that would probably be better. but I think most people will be able to interpret the current graph just fine..

Togge said...

The 'gmpy' seems to be a "multiprecision arithmetic" (see with implementation both in C++ and in Python.

Which brings me to a question: Is it generally easier to make Shedskin support modules implemented in C++ or is Python better?

(My guess is "it depends"... :))

Joel said...

Could you redo the graph with a logarithmic scale? It would be much fairer.

srepmub said...

sorry, I'm not interested in changing the graph, because it's perfectly clear IMO. if you'd like to make your own graph and blog about it, you can use the individual timings I posted to the discussion group.

Alex said...

@Togge, please _don't_ use the extremely ancient and obsolete version of gmpy that's on sourceforge -- is supposed to eventualy become the official site, and for the moment it points to the very up-to-date repository. (sourceforge won't let me in any more so I've given up on closing that ancient project, for now).

Anyway, yes, gmpy is a general multiprecision library for Python using GMP, the Gnu Multiprecision Library (or alternatively MPIR in gmpy's recent versions).