Thursday, December 16, 2010

Shed Skin 0.7, Type Inference Scalability

I have just released version 0.7 of Shed Skin, a (restricted) Python-to-C++ compiler. The new version comes with lots of minor fixes and some optimizations, as well as a new Windows package. The Windows package was upgraded to a recent version of MinGW, which means it now includes GCC 4.5. I hope to package each new release for Windows again from now on.

There are also two nice new examples, one of which is over 1,000 lines (sloccount), for a total of 52 examples. Both are ray tracers, and it's always nice to look at the output of ray tracers. The larger one has a GUI and uses the multiprocessing module in combination with a Shed Skin generated extension module to do its heavy lifting. The other one uses a very slow but more realistic ray tracing technique, for a very nice result. Thanks to Eric Uhrhane and Jonas Wagner for sharing their programs!

I'd like to take the opportunity to talk a bit about type inference scalability. This graph shows the time Shed Skin needs for each of the current examples (seconds versus sloccount; the point on the right is the C64 emulator). I have to mention that two examples are left out here, because of a non-termination issue - that is, Shed Skin aborts after a certain amount of iterations. The reason I haven't looked into fixing this issue yet is that they do compile and run fine after Shed Skin aborts, because the type analysis is already precise enough at this point (though it takes some time). Precisely because of that, I don't think this issue would be hard to fix.

So while the current type analysis is obviously not perfect, I think this graph shows that type inference may not be all that hard as some people would have thought, at least for statically restricted code (the restriction does make it easier!). And I'm sure a team of programmers or someone smarter than me could do a much better job.

For those with some knowledge of the type inference literature, I'd like to explain at this point how Shed Skin achieves this scalability. Who knows this might be useful to people starting out in this area. It is actually quite simple conceptually (but don't think you can implement this correctly in a few weeks or even months for a large language such as Python!).

Shed Skin combines Ole Agesen's Cartesian Product Algorithm (CPA) with the data-polymorphic part of John Plevyak's Iterative Flow Analysis (IFA), both published in 1996 if I'm correct, and as suggested by Agesen. Because the CPA algorithm has a the tendency to explode for larger programs (usually in combination with the first, very imprecise iteration of IFA), Shed Skin 0.6 adds an incremental approach to the mix. It simply repeats its analysis for increasingly larger versions of a program, starting at nothing, essentially, and slowly adding to the callgraph until the whole program is analyzed. This seems to greatly avoid the CPA from exploding.

Saturday, October 16, 2010

Shed Skin 0.6

I have just released version 0.6 of Shedskin, an experimental (restricted-)Python-to-C++ compiler.

Most importantly, this release finally comes with a substantial scalability improvement. Instead of analyzing a whole program at once, Shedskin now repeatedly analyzes an ever increasing version of the program, starting at nothing, essentially, until the whole program is analyzed. It turns out it this is much easier than analyzing programs as a whole. The result is that Shedskin now easily scales to several test programs of over 1,000 lines (sloccount), and needs much less memory to analyze them.

I'd really like to receive programs that Shedskin 0.6 is unabe to analyze within a reasonable amount of time, to get an idea of the actual extent of the improvement, and to be motivated to further improve things. It's hard to be motivated to improve things if all programs you know of work well!

Interestingly, because of the incremental analysis, it also became possible to add a progress bar to Shedskin. It will slow down as we go to 100 percent, and be somewhat inaccurate if parts of a program turn out to be unused, but I for one couldn't be happier with it.

Another area that has seen substantial improvements is the support for generating extension modules. For example, basic (un)pickling of compiled classes should now work, and inheritance and overloading should now be better exposed to the outside world.

The most important bugfix is probably to correct the module/linenumber information for warnings and errors. It turns out the '#{' feature messed up line numbers, and with code spanning several files, Shedskin would often display the wrong module as well. This should all be fixed now.

More details about the release can be found in the release notes.

I would like to thank the following people for contributing to this release: Hakan Ardo, Thomas Spura, Eric Uhrhane, Saravanan Shanmugham, HartsAntler and Douglas McNeil, as well as Plonjers, for sending in the nice test case that triggered the scalability improvement.

update: Just after the release, Danny Milosavljevic sent in a great 50th example program, a Commodore 64 emulator! The emulator itself is a work in progress, but it boots and you can start typing commands. After modifying it a bit so it compiles out-of-the-box with 0.6, I added it to the 0.6 example tarball.

Sunday, August 08, 2010

Shed Skin 0.5

I have just released Shed Skin 0.5, an optimizing (restricted) Python-to-C++ compiler. Looking back at the release notes, I'm happy to see contributions from more people than ever. I'd also like to thank David Ripton again, for sending me his Project Euler solutions (see my previous post), and Jason Ye, for sending in so many issues.

The new release supports 64-bit integers (shedskin -l), which was necessary to get many of the Project Euler solutions to work. I'm wondering if this perhaps should become the default, since by default we want to stay as close to Python as possible.. But I don't think it's really useful that often, and it will probably be slower in many cases, so we'll see.

In comparing the Project Euler solutions with Psyco, it became clear that Shed Skin did not perform very well for certain types of iteration. I found out that exceptions in C++ are extremely slow, greatly slowing down e.g. set iteration. The new release avoids throwing StopIteration most of the time, that is, inside the builtins.

After basing the set implementation on CPython before, FFAO saw my invitation to do the same for the dict implementation, and quickly implemented it. I think Shed Skin was able to beat Psyco on about 5 more euler solutions with this new implementation.

Random number generation (without using shedskin -r) should also be much faster now.

Andy Miller sent in patches to add basic support for MSVC (shedskin -v). Although of course there is still no recent Windows package for Shed Skin. If anyone would like to volunteer, it should be easy to base a 0.5 package on the 0.3 version for Windows, though it would be nice to use a more recent version of MinGW.

Douglas McNeil sent in patches to add support for __future__.print_function and generator methods (as opposed to non-method functions). Thomas Spura optimized printing somewhat, though there is still some room for optimization there. Finally, Michael Elkins sent in some improvements for the socket module (which he originally wrote).

More details about the new release can be found in the release notes.

Sunday, May 02, 2010

Shedskin versus Psyco for 131 Project Euler Solutions

(Shedskin is an experimental (restricted) Python-to-C++ compiler.)

After releasing 0.4 I received a link to a blog post comparing PyPy, Unladen Swallow and Psyco for a set of over 100 project euler solutions. Of course, I decided to ask the author for the source code of these solutions, to have a go with Shedskin. It turned out to be a great test set for spotting performance flaws, and it even convinced me to add 64-bit integer support (shedskin -l), though I'm still not sure how generally useful that is.

Out of 204 solutions in total, I was able to get 131 to compile and run correctly (using the new 64-bit support), at a total of about 5,000 lines (sloccount). In some cases, I had to work around a Shedskin limitation, such as that mixing None and integers or nested functions are not allowed, but these were all very simple changes. While in many cases, it would have been easy to optimize things for Shedskin, I didn't make a single change other than to get things to compile. The other 73 solutions either needed arbitrary-precision arithmetic or use some unsupported module (mostly 'gmpy', whatever that is).

After optimizing Shedskin here and there (though only in generally useful ways, nothing really specific to these tests), I compared the run-time of Shedskin with that of Psyco on these 131 solutions. The graph below shows the relative speedup of Shedskin versus Psyco, taking the shortest time of three runs for each. Shedskin is faster in 109 cases out of 131, and if we take the cumulative time (730 versus 1330 seconds), 45% faster in total. Many of the 22 cases where Psyco wins deal with dictionaries, and I think Shedskin could also win most of these with a better dict implementation (any volunteers?).



A reason why the results may be somewhat flattering for Shedskin, is that quite a few solutions run in only a fraction of a second, and Psyco may suffer from a longer startup time (though I wouldn't know). On the other hand, many of the solutions are quite memory-intensive and spend a lot of time in the python builtins, so Shedskin often doesn't get a chance to shine. I think with a few manual changes to the solutions, many would run much faster using both Psyco and Shedskin, but with a larger speedup for Shedskin.

Unfortunately, Project Euler being semi-competitive and all, I cannot share the solutions. But I will mail the timings for each solution to the discussion group. Anyway, here's the setup I used to calculate the graph:

- shedskin -l (SVN 2010-5-1)
- g++ 4.4.3 -O3 -fprofile-generate/use -march=native -mtune=native -fomit-frame-pointer
- q9550s CPU
- each solution is run three times with Psyco, then three times with Shedskin, with a time.sleep(2) after each run
- the fastest time for each is taken.

So again, if anyone would be interested in rewriting the dict implementation, based on the CPython code (instead of relaying to the STL), that would be great. FFAO did this before for the set implementation, and that helped a lot.

Wednesday, March 31, 2010

Shed Skin 0.4

I have just released Shed Skin 0.4, an experimental (restricted-)Python-to-C++ compiler. Thanks again especially to Jeremie Roquet for helping out.

The biggest 'improvement' of this release is perhaps that Windows is no longer supported. Windows users are encouraged to take over maintainership of the MinGW version, or to upgrade to a 'real' operating system, such as Ubuntu, and live in software freedom.

Other improvements include support for generator expressions, real boolean support (instead of integers that print as 0 and 1), support for the 'key' argument of 'min' and 'max', another useful type inference scalability improvement, and support for heapq.{merge, nsmallest, nlargest}. Please see the release notes for the full list of changes.

Four new example programs were also added, for a total of 47 programs:

-pylife, a game of life implementation based on the wonderful hashlife algorithm (David Bau)
-a nice a-star implementation, with a pygame frontend (John Eriksson)
-a simple game of life implementation that gave the type analysis some trouble (Francesco Frassinelli)
-a second genetic algorithm (Stavros Korokithakis)

Please go ahead and try it out, and let me know about any problems. I'm also always very happy to receive new example/test programs to play with, especially if type inference fails (or doesn't terminate) for them!

Wednesday, January 13, 2010

Shed Skin 0.3

After 5 years of work, I'm proud to announce Shed Skin 0.3, an experimental (but obviously restricted) Python-to-C++ compiler. Looking over the release notes, I'm convinced this must be the best release so far. I would like to thank especially Jeremie Roquet, for several major contributions to this release, and Thomas Spura, for reorganizing the codebase. Joris van Rantwijk provided the great new 'maximum weighted matching' example, and he and Mike Schrick sent in minor but useful patches.

For me, these are the highlights of the release:

- support for 'itertools', 'heapq' and 'csv', bringing the total set of (mostly) supported library modules to 20
- 4 new example/test programs, bringing the total to 44 examples, at over 10,000 lines in total (sloccount)
- improved type inference scalability (while greatly simplifying the code)
- support for 'map', 'filter' and 'reduce' (yes, finally, go wild.. :-))
- support for the 'with' statement
- support for the 'key' argument to 'sorted' (and 'list.sort')
- reduced the compiler core by about 100 lines (again)
- a distutils-based setup.py (talk about 'finally'!)
- several important optimizations (indexing, slicing, adding 1-length sequences)
- an unusually large number of bug fixes

Please go ahead and try it out, and start sending in those issues. I'm also always very happy to receive new example/test programs to play with, especially if type inference fails (or doesn't terminate) for them!