Shed Skin - A (restricted) Python-to-C++ Compiler: 2006

Friday, December 29, 2006

Approaching 0.1

After 0.0.15 I have quickly released 0.0.16. It essentially adds frozenset and fixes some bugs, as reported by users of 0.0.15. The next release is coming along nicely too, and should be ready within about a week. Here's a list of changes:

-added frozenset
-time.sleep now works on WIN32
-constant-string expressions and __doc__ attributes are made into nice C++ comments
-added --nowrap optimization option to ss.py (disables checking for negative indices)
-several minor bug-fixes reported by users of 0.0.15

Because I have a huge amount of spare time coming up to work on Shed Skin, I have started to think about what is necessary to do a 0.1 release. These are currently the most important shortcomings:

-support for iterators, generators and generator expressions

The first two seem mostly a case of how to do this nicely in C++, and the latter one seems easy.

-support for tuples t of len > 2 and with different types of elements

Again this seem mostly a case of how to do this nicely in C++. I'm looking forward to finding out how to do this.

-scalability of the type analysis

Because of all the details involved in doing a python compiler, I still haven't focused on this too much. The basic techniques seem to work very well, but I have many ideas yet to work out to improve scalability. Based on my work so far, I'm convinced it is possible to quickly analyze most programs up to a few thousands lines, even without any form of profiling.

-two-way integration with CPython

There may be some progress at this front, as another developer has decided to look into using boost.python for this purpose. It's rather exciting to see small fragments of this work already. If anything, it has made me optimistic about having something like this work in the near future.

-some optimizations: strings (replace std::string), indexing (remove wrap-around checks by proving an index expression is always positive), heap allocation (transform heap allocation into stack- and static preallocation, when possible)

While not essential for a 0.1 release, an important goal of Shed Skin is to generate really fast code. Especially the latter optimization would be a great master's thesis subject.. :-)

Friday, December 08, 2006

Shed Skin 0.0.15

After being distracted by my work for a few months, I'm finally back to Shed Skin development. It's interesting how the same boring programming work is more fun when you don't get paid for it! Anyway, I hope to work on Shed Skin full-time for at least the next three months, which should lead to several interesting improvements. I'm not sure exactly which those will be yet, as I never really plan ahead much.

For starters, I updated Shed Skin to work with the newly released Python version 2.5. As part of this, I added support for the new 'any' and 'all' keywords and for conditional expressions. The new Shed Skin release (0.0.15) also comes with support for several, previously unsupported library functions. IIRC, these are:

os.path.{split, splitext, islink, isdir, isfile, exists}
os.{stat, lstat, rename, chdir}
stat.*
fnmatch.{fnmatch, fnmatchcase}
random.{seed, sample}

Because quite some library modules are now (partially) supported, I moved all of them to a separate 'lib' directory. Because the ugly '_' in filenames is not needed anymore, this makes it much easier to add to the set of libraries supported by Shed Skin (hint, hint!) If you are interested in doing so, please have a look at the updated README file.

The os.path.* and stat.* support was added by running Shed Skin over (slightly modified) pure Python implementations, taken from the PyPy project. I am getting more and more optimistic about this approach, improving the amount of supported library functions, helping me to locate bugs and motivating me to fix them all at the same time. The next module I may look into is the 're' module, which should help me locate a few more bugs, and help me to add support for other modules that make use of it. Please let me know if you're interested in helping out increasing the number of supported libraries, as there's probably lots of low-hanging fruit.

Most of the newly supported functionality was added in anticipation of compiling an interesting program I found. After messing around with Amarok, Gtkpod, Gnu-pod and what not, I finally found a program to reprogram the database on Apple's latest iPod shuffle (the 15.5 gram 1 GB one :D) that actually worked. Needless to say, it was written in Python.. :-) One interesting application is to put it on the shuffle itself, so you can run it from any computer. The downside is the dependency on CPython. Looking through the code, at about 600 lines, it looked like a potential victim for Shed Skin. The author agreed, and hopefully Shed Skin generated code will be on many iPod's in the near future :-) Here's the link to the program:

http://shuffle-db.sourceforge.net/

Saturday, August 12, 2006

Shed Skin 0.0.14, 1600-line program

hi all (two readers :-))

it's been a while, but I've been hard at work improving Shedskin. I actually got paid for about a month, to support a certain 1600-line program. it compiles fine now, so that means it's a new record :-) I'm hoping very much to find another job like this, so if your boss might be interested in paying me to do a 'cheap' translation of some Python program to C++, please let me know. I don't need a lot of money to support myself :-)

okay, so what's new in this release:
- string formatting has been hugely improved, so most combinations of flags and types should give the same result as in Python now
-several new imports are supported now: getopt.getopt, cStringIO.StringIO, string.*, os.{getenv, getcwd}, and shedskin-specific (typed) versions of struct.{pack, unpack}: struct.{pack_ints, unpack_ints}, that may be useful.
-many, many bugfixes, resulting from debugging a 1600-line program :-)

interestingly, getopt.getopt is supported by taking a pure Python implementation (in this case, from the PyPy project) and compiling it to C++ :-) in the future, I think this technique can be used to support several other modules (possibly 're'), helping me locate bugs in Shedskin and improve the amount of supported libraries at the same time.

Thursday, July 13, 2006

shortest sudoku solver

bearophile sent me a link to this really cool program:

def r(a):i=a.find('0');~i or exit(a);[m
in[(i-j)%9*(i/9^j/9)*(i/27^j/27|i%9/3^j%9/3)or a[j]for
j in range(81)]or r(a[:i]+m+a[i+1:])for m in'%d'%5**18]
from sys import*;r(argv[1])

apparently it's a sudoku solver (couldn't you guess?). here's more information about it:
ShortestSudokuSolver

unfortunately, it mixes integers and strings ((i-j).. or a[j]) and arbitrary-size arithmetic (5**18), both of which are currently not supported by shed skin. the arbitrary-size arithmetic could be supported in this case, but I rewrote it for now. here's a version that works with shed skin CVS:

def r(a):
i=a.find('0')
if not ~i: print a; exit()
[m in [a[j] for j in range(81) if not (i-j)%9*(i/9^j/9)*(i/27^j/27|i%9/3^j%9/3)] or r(a[:i]+m+a[i+1:]) for m in '3814697265625']
from sys import *;r(argv[1])

it's 211 characters, versus the 178 of the one above, so that's not too bad :) and it becomes 18 times faster here.

basic pygame support

http://mark.dufour.googlepages.com/fysphun.png

this shows an interactive graphical program working with shed skin :-) the user can drag around points, and the connected 'bodies' move around realistically. to get this to work, I had to add basic pygame support (drawing points , lines, event handling). doing this was an interesting experience - I now feel it should be not too difficult to autogenerate bindings to many libraries, based on 1) a simple type model (manually written, see *_.py) and 2) the results of type inference. because my job requires me to implement/bridge some library calls, maybe I can be paid to work on this. hmm..

Monday, July 03, 2006

Shed Skin 0.0.11, give me money

I have just released Shed Skin 0.0.11. It contains several important fixes again (see test 162). Most notably, lambda support is greatly improved (especially in an OO setting), and casting of incomplete types is re-enabled (so you can do e.g. ()+(5,), a = [[1]]; a = [[]] and such).

In other news, I will probably be paid on a temporary basis this summer, to support a certain 1600-line program. Please let me know if you have a Python program/prototype that you'd like to have converted to C++, and I can probably tell you how much time this would take me.

Thursday, June 15, 2006

Shed Skin 0.0.10, Summer of Code 2006

I have just released Shed Skin 0.0.10. besides several important bug fixes (see test 161), it contains many new error messages for unsupported features and dynamic (sub)types. this should make it much easier to try out Shed Skin and work around basic problems. please try it out and let me know about any problems/successes ^^

in other news, a Shed Skin proposal has again been accepted in this year's Summer of Code. the student will investigate memory optimizations more deeply than I have done for my thesis, and implement both a stack allocation and static preallocation technique. we expect to start discussing the topic on the shed skin mailing list in the coming weeks.

finally, phillip hassey (also a Python SoC mentor) suggested an interesting idea: to automatically create *_.py files from pure C++ library header files, so compiled code can directly use many C++ libraries. this shouldn't be too difficult (the header files contain all type information), and has very interesting potential.. please let me know if you'd like to work on this!

Wednesday, May 17, 2006

pystone/richards benchmarks, upcoming 0.0.9

I have been working over the past few days to get the pystone and richards benchmarks compiling well. as they do now, I added them to the test set. both are very simple from a type inference perspective, but they did help me uncover and fix several minor and a few major issues.

the speedups on my computer are about 10 for pystone and 185 (!) for richards. the latter is probably due to the fact that richards is heavily OO, and C++ compilers know how to efficiently implement that! :-)

I hope to release a 0.0.9 version within a few weeks, with these and some other changes. there is one problem that I have observed a few times now, that I would like to fix before that. sometimes, type information is lost during inference, so that results are incomplete.. I think I know what causes the problem.

Wednesday, May 03, 2006

Shed Skin 0.0.8, new website, Google SoC/Thesis?

I have just released Shed Skin 0.0.8. for this version, I removed about 1000 lines (mostly memory optimizations - so the compiler is now less than 6000 lines!), cleaned up stuff a bit (it's still a monolithic file though), added/completed more string methods and applied many minor bugfixes and several more error messages, based on Bearophile's list of known bugs. thanks man! :-)

I also created a simple Shed Skin 'homepage' and modified the README, to better introduce Shed Skin to people. please modify any links to my blog or the sourceforge site to this page - see the link on the right. please let me know if you think I should change something.

now that the source code is becoming pretty clean, and there are many largish test programs that run well (see Section 5 of me thesis!), the time seems right to invite other people to join the project, and look into some important aspect I don't have enough time for/interest in. there are three important things that can be investigated relatively separately:

-I removed my simple memory optimizations (turning heap allocation into stack- and static preallocation). this is a fascinating subject, with a lot of existing techniques coming from the Java community. as can be seen from my thesis, it can really help performance as well. I just never had the time to properly investigate it.

-SS currently uses the bloody C++ STL string type, which makes it really slow for string-intensive programs. it would be really nice to have a more efficient (preferrably OO) string type, possibly using Psyco-like techniques. since I never really use strings much, I do not have enough interest in this myself, but I recognize the importance.

-integration of Python code and compiled code remains a hassle. currently, a lot of manual work is needed to provide 'bindings'. it would be great to somehow have a (semi-)automated process, to enable compiled code to at least use the standard library, and to be able to easily call compiled code from Python programs.

if you are interested in any of these three topics, note that the deadline for the Google Summer of Code 2006 is in about a week. since SS got accepted last year, and there will probably be more slots for Python this year, this might be worth a try! let me know, and we can cook up a proposal together.

the first topic (memory optimization) is also a great topic for doing a Master's/PhD Thesis. unfortunately, Robert could not find a mentor for this. please let me know if you are interested, or you know of a compiler-savvy (Master/PhD) student that might be interested!

Tuesday, April 25, 2006

Master's Thesis

I haven't been putting as much effort as I'd like into SS development lately, because of writing my Master's Thesis. I have added a link to the resulting document to the links on the right. There are some interesting performance measurements inside. For a benchmark set of 16 programs, SS typically results in a speedup factor of 2-40 versus Psyco, 12 on average, and 2-220 versus CPython, 45 on average.

As for the future, I've decided to drop all memory optimizations (stack and static preallocation), since I never really put much thought into them, and they only help 'marginally' (since I did not do them well :P) I'm also discontinuing my 'char type' work, and any ideas about automatically connecting to the standard library. This is so that I can focus on the core type inference and code generation stuff.. Which I hope to continue soon :-)

Friday, January 27, 2006

Shed Skin 0.0.6

Here goes 0.0.6.. :)

With the help of Bearophile, I optimized list comprehensions and iteration a bit further. Programs are now also compiled together with the builtins, which can greatly improve performance in some cases. For example, the Pythonchess speed test engine now becomes about 22 times faster on my computer. Please check it out, and let me know about any problems you encounter.

For 0.0.7, I am working on a 'char' type. That is, let the compiler use the C++ 'char' type whenever programs are working with strings of length one. This dramatically improves the speed of programs that use strings of length 1 and/or 'string[index]' a lot. If you'd like to try it out, keep a watch on CVS, because I will commit this change soon, and let me know about any problems.

Saturday, January 14, 2006

0.0.6 Update

Hello there,

Just to let anyone interested know that SS development is alive and kicking; expect me to release 0.0.6 within about a week! Unfortunately, I did not find the time to work on a connection with arbitrary external libraries. There have been many small improvements, though, relative to 0.0.5.9. I also added two new largish programs:

-a stripped-down version of Pythonchess, thanks to Jyrki, the author! (360 lines)
-yet another sudoku solver, that would cause older versions of SS to choke badly (178 lines)

The former becomes about 9 times faster on my PC, but there are probably some simple optimizations I can do to improve this further. The latter really gave the type inferencer a hard time, so I added some simple heuristics to 'guess' types, for use as a starting point of the analysis. This greatly reduced the analysis time for many other tests.

Btw, the raytracer is 65 times faster on my PC at home (X2 4800+), instead of the 40 I measured in China :-)

Shed Skin now correctly compiles about 6000 lines of unit tests. Included in these are the following non-trivial programs:

-satisfiability solver 1
-satisfiability solver 2
-min/max othello player
-neural network simulator
-sudoku solver 1
-sudoku solver 2
-sudoku solver 3
-convex hull
-voronoi
-mandelbrot
-n-queens
-the pygmy raytracer
-tic-tac-toe on arbitrary-size boards
-linear algebra routines
-simple genetic algorithm
-conway game of life
-pythonchess speed test engine

I'm starting to become pretty confident the compiler will work well in general. The speedups are also pretty good in general, but I think it can be much improved still, by tweaking code generation and the C++ versions of the Python builtins. I could really use a hand here! :D

(update: Anyone would like to try and get SS to run using, say, vc++?)