Sunday, September 18, 2005

Shed Skin 0.0.2: Easy Windows/OSX Installation

Shed Skin 0.0.2 is up on SourceForge. It should install easily on Windows 2000/XP and on OSX. Please give it a try and let me know if there are still some problems.

If you would like to help me improve Shed Skin, please send me small code snippets, preferrably extracted from real-life use cases, that the compiler has problems with.

Thanks to everyone who has helped me out, especially Khalid and Luis on python-list, and Denis de Leeuw Duarte right here in the street :-)

Update: 0.0.3 has also been released, with improved support for builtin functions, bug fixes and a Windows package of only 3 MB.

Update: We're on a roll, with 0.0.4. Many improvements again; added several interesting test cases (thanks to bearophile!): convex hull, n-queens problem, pascal triangle and ascii mandelbrot.

Saturday, September 10, 2005

Announcement

First release of Shed Skin, a Python-to-C++ compiler.


After nine months of hard work, I am proud to introduce my baby to the world: an experimental Python-to-C++ compiler. It can convert many Python programs into optimized C++ code, without any user intervention such as adding type declarations. It uses rather advanced static type inference techniques to deduce type information by itself. In addition, it determines whether deduced types may be parameterized, and if so, it generates corresponding C++ generics. Based on deduced type information, it also attempts to convert heap allocation into stack and static preallocation (falling back to libgc in case this
fails.)

The compiler was motivated by the belief that in many cases it should be possible to automatically deduce C++ versions of Python programs, enabling users to enjoy both the productivity of Python and the efficiency of C++. It works best for Python programs written in a relatively static C++-style, in essence enabling users to specify C++ programs at a higher level.

At the moment the compiler correctly handles 124 unit tests, six of which are serious programs of between 100 and 200 lines:

-an othello player
-two satisfiability solvers
-a japanese puzzle solver
-a sudoku solver
-a neural network simulator

Unfortunately I am just a single person, and much work remains to be done. At the moment, there are several limitations to the type of Python programs that the compiler accepts. Even so, there is enough of Python left to be able to remain highly productive in many cases. However, for most larger programs, there are probably some minor problems that need to be fixed first, and some external dependencies to be implemented/bridged in C++.

With this initial release, I hope to attract other people to help me locate remaining problems, help implement external dependencies, and in the end hopefully even to contribute to the compiler itself. I would be very happy to receive small programs that the compiler does or should be able to handle. If you are a C++ template wizard, and you would be interested in working on the C++ implementation of builtin types, I would also love to get in contact with you. Actually, I'd like to talk to anyone even slightly interested in the compiler, as this would be highly motivating to me.

The source code is available at the following site. Please check the README for simple installation/usage instructions. Let me know if you would like to create ebuild/debian packages.

Sourceforge site: http://shedskin.sourceforge.net
Shed Skin blog: http://shed-skin.blogspot.com

Should you reply to this mail, please also reply to me directly. Thanks!


Credits

Parts of the compiler have been sponsored by Google, via its Summer of Code program. I am very grateful to them for keeping me motivated during a difficult period. I am also grateful to the Python Software Foundation for chosing my project for the Summer of Code. Finally, I would like to thank my university advisor Koen Langendoen for guiding this project.


Details

The following describes in a bit more detail various aspects of the compiler. Before seriously using the compiler, please make sure to understand especially its limitations.

Main Features

-very precise, efficient static type inference (iterative object contour splitting, where each iteration performs the cartesian product algorithm)
-stack and static pre-allocation (libgc is used as a fall-back)
-support for list comprehensions, tuple assignments, anonymous funcs
-generation of arbitrarily complex class and function templates (even member templates, or generic, nested list comprehensions)
-binary tuples are internally analyzed
-some understanding of inheritance (e.g. list(dict/list) becomes list(iter(A)))
-hierarchical project support: generation of corresponding C++ hierarchy, including (nested) Makefiles; C++ namespaces
-annotation of source code with deduced types
-builtin classes, functions (enumerate, sum, min, max, range, zip..)
-polymorphic inline caches or virtual vars/calls (not well tested)
-always unbox scalars (compiler bails out with error if scalars are mixed with pointer types)
-full source code available under the MIT license

Main Limitations/TODO's

-Windows support (I don't have Windows, sorry)
-reflection (getattr, hasattr), dynamic inheritance, eval, ..
-mixing scalars with pointer types (e.g. int and None in a single variable)
-mixing unrelated types in single container instance variable other than tuple-2
-holding different types of objects in tuples with length >2; builtin 'zip' can only take 2 arguments.
-exceptions, generators, nested functions, operator overloading
-recursive types (e.g. a = []; a.append(a))
-expect some problems when mixing floats and ints together
-varargs (*x) are not very well supported; keyword args are not supported yet
-arbitrary-size arithmetic
-possible non-termination ('recursive customization', have not encountered it yet)
-profiling will be required for scaling to very large programs
-combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
-unboxing of small tuples (should form a nice speedup)
-foreign code has to be modeled and implemented/bridged in C++
-some builtins are not implemented yet, e.g. 'reduce' and 'map'

Friday, September 02, 2005

SoC: finished!

C++ is fantastic. It was very easy to integrate a garbage collector into the compiler: I only needed to add some simple code to a single header file, and it worked. I've also made grateful use of template specializations in redesigning the tuple implementation. In addition to these larger features, I also implemented a huge amount of smaller features over the last two weeks. Apart from several theoretical niceties, such as generation of complex generics, the compiler can now actually handle several relatively large programs as well. In all, I think I have reached all SoC milestones, and much more.

There are now 6 larger programs in the unit test set, written without any static compilation in mind (by friends, or by me, before I wanted to build a python compiler,) that the compiler now handles fine. In total they weigh in at 879 lines of Python code, which the compiler transforms in total into 1749 lines of C++ code (excluding header files, macro expansion, builtin functions, ..) These programs are:

-a simple satisfiability solver (110 lines, complicated list comprehensions)
-a japanese puzzle solver (180 lines, uses some real polymorphism)
-a more advanced satisfiability solver (170 lines)
-a not-too-bad othello player (123 lines, does much with binary tuples)
-a neural network simulator (95 lines, uses classes, badly polluted first iteration)
-a sudoku solver (177 lines)

The compiler generally takes less than 20 seconds to analyze each program on my old athlon XP 1800+. The speedup is in general larger than a factor of 10, often much more.

In fact, I would like to invite any adventurous soul at this point to try out the compiler, and send in anything that doesn't work but should (please, as small as possible), or anything non-trivial that does work (so I can add it to the test set.)

Note that there are currently several important limitations to the type of Python code that is supported:

-Of course, don't use reflection, eval or weird run-time polymorphism that cannot be easily mapped to C++.
-Do not import libraries. Only Set is currently supported. I am working on a simple system that will allow users to add C++ versions of libraries to the compiler.
-Do not mix None and integer types. The compiler will bail out with an error. Use zero or any other special value instead. Do not mix scalar with pointer types. Try to keep things uniform. This is usually more elegant anyway. However, it is often useful to mix types in binary tuples, and the compiler supports this.
-But do not use tuples with different element types with a length greater than 2. Binary tuples with differently types elements can also not be used in addition, etc. 'Zip' only supports two arguments.
-Declaring things before use may sometimes be necessary at this point. For example, classes. I hope to fix this soon.
-Using varargs and kwargs will probably end in tragedy.
-In general, program elegantly, like you would in C++, but without all the type declarations.
-The compiler has not been tested on programs larger than 250 lines.

I still need to fix a non-termination problem I encountered, but this won't occur for most programs under 250 lines. I also need to solve many smaller issues. However, the compiler should work pretty well for many programs. If you're interested, by all means, please try it out and let me know how it went. For any larger program, there are probably a few minor issues or missing builtin functions that I would gladly fix, if I have a use case.

The compiler is currently only available via CVS. I hope to release a packaged version in about a month. Please see the README with instructions on how to use it. In any case, it won't work under Windows. Let me know if you would like to debug this.. :-)