Saturday, September 10, 2005

Announcement

First release of Shed Skin, a Python-to-C++ compiler.


After nine months of hard work, I am proud to introduce my baby to the world: an experimental Python-to-C++ compiler. It can convert many Python programs into optimized C++ code, without any user intervention such as adding type declarations. It uses rather advanced static type inference techniques to deduce type information by itself. In addition, it determines whether deduced types may be parameterized, and if so, it generates corresponding C++ generics. Based on deduced type information, it also attempts to convert heap allocation into stack and static preallocation (falling back to libgc in case this
fails.)

The compiler was motivated by the belief that in many cases it should be possible to automatically deduce C++ versions of Python programs, enabling users to enjoy both the productivity of Python and the efficiency of C++. It works best for Python programs written in a relatively static C++-style, in essence enabling users to specify C++ programs at a higher level.

At the moment the compiler correctly handles 124 unit tests, six of which are serious programs of between 100 and 200 lines:

-an othello player
-two satisfiability solvers
-a japanese puzzle solver
-a sudoku solver
-a neural network simulator

Unfortunately I am just a single person, and much work remains to be done. At the moment, there are several limitations to the type of Python programs that the compiler accepts. Even so, there is enough of Python left to be able to remain highly productive in many cases. However, for most larger programs, there are probably some minor problems that need to be fixed first, and some external dependencies to be implemented/bridged in C++.

With this initial release, I hope to attract other people to help me locate remaining problems, help implement external dependencies, and in the end hopefully even to contribute to the compiler itself. I would be very happy to receive small programs that the compiler does or should be able to handle. If you are a C++ template wizard, and you would be interested in working on the C++ implementation of builtin types, I would also love to get in contact with you. Actually, I'd like to talk to anyone even slightly interested in the compiler, as this would be highly motivating to me.

The source code is available at the following site. Please check the README for simple installation/usage instructions. Let me know if you would like to create ebuild/debian packages.

Sourceforge site: http://shedskin.sourceforge.net
Shed Skin blog: http://shed-skin.blogspot.com

Should you reply to this mail, please also reply to me directly. Thanks!


Credits

Parts of the compiler have been sponsored by Google, via its Summer of Code program. I am very grateful to them for keeping me motivated during a difficult period. I am also grateful to the Python Software Foundation for chosing my project for the Summer of Code. Finally, I would like to thank my university advisor Koen Langendoen for guiding this project.


Details

The following describes in a bit more detail various aspects of the compiler. Before seriously using the compiler, please make sure to understand especially its limitations.

Main Features

-very precise, efficient static type inference (iterative object contour splitting, where each iteration performs the cartesian product algorithm)
-stack and static pre-allocation (libgc is used as a fall-back)
-support for list comprehensions, tuple assignments, anonymous funcs
-generation of arbitrarily complex class and function templates (even member templates, or generic, nested list comprehensions)
-binary tuples are internally analyzed
-some understanding of inheritance (e.g. list(dict/list) becomes list(iter(A)))
-hierarchical project support: generation of corresponding C++ hierarchy, including (nested) Makefiles; C++ namespaces
-annotation of source code with deduced types
-builtin classes, functions (enumerate, sum, min, max, range, zip..)
-polymorphic inline caches or virtual vars/calls (not well tested)
-always unbox scalars (compiler bails out with error if scalars are mixed with pointer types)
-full source code available under the MIT license

Main Limitations/TODO's

-Windows support (I don't have Windows, sorry)
-reflection (getattr, hasattr), dynamic inheritance, eval, ..
-mixing scalars with pointer types (e.g. int and None in a single variable)
-mixing unrelated types in single container instance variable other than tuple-2
-holding different types of objects in tuples with length >2; builtin 'zip' can only take 2 arguments.
-exceptions, generators, nested functions, operator overloading
-recursive types (e.g. a = []; a.append(a))
-expect some problems when mixing floats and ints together
-varargs (*x) are not very well supported; keyword args are not supported yet
-arbitrary-size arithmetic
-possible non-termination ('recursive customization', have not encountered it yet)
-profiling will be required for scaling to very large programs
-combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
-unboxing of small tuples (should form a nice speedup)
-foreign code has to be modeled and implemented/bridged in C++
-some builtins are not implemented yet, e.g. 'reduce' and 'map'

10 comments:

Baczek said...

Sounds like Starkiller, except that it exists! Nice job! Didn't have the time to force it to work on win32, though; and there's _nothing_ about gc -- where to get it from?

srepmub said...

To d. mellis:

Wow, thank you so much for the detailed explanation about getting the compiler to run under OSX! I'll try to add some of the things you did to the compiler, so it will be easier for other people.

I'm not sure what the problem with the unit tests is though.. :-/

Btw, I would be very interested to see the code you're planning on compiling.

srepmub said...

to baczek:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

It says here it should work under Windows. I can't try this for myself for at least a few more weeks. Please let me know if this works for you or not.

srepmub said...

to baczek:

http://mail.python.org/pipermail/python-list/2005-September/298697.html

here's someone who got the compiler to run under windows. not all unit tests pass, but almost. let me know if this works for you or not. I may be easier to do with DJGPP than MINGW, but I cannot test this for myself.

Baczek said...

Ok, I've downloaded Boehms gc 6.6 and it works on Win XP SP1 with mingw 3.4.2 (compiled as dll.)

I had to patch ss a little bit - changed win32 compiler to g++, removed -lgccpp (everything seems to work, but I have no idea if it indeed does :)), changed rm to del. I'm now running unit.py and not seeing many failures (one segfault currently.) The whole thing is slow, but that's mainly due to g++ being slow as hell on win32.

I'll update as soon as it finishes.

Baczek said...

It's done:

*** tests failed: 2
[(60, '__class__ and __name__ attributes'), (85, 'ifa: mixing strings and lists of strings in the same list')]

BTW, I patched the gc with patches provided in the message you linked to.

Here are gdb backtraces of both crashes (and btw, you're not returning a value from main:)):

test #60: (this doesn't look very helpful)
Starting program: C:\python-stuff\shedskin-0.0.1/test.exe

Program received signal SIGSEGV, Segmentation fault.
0x0040161d in __test::__main() () at test.cpp:24
24 if ((__type(new str("2")))->__eq__(__type(new str("3")))) {
(gdb) bt
#0 0x0040161d in __test::__main() () at test.cpp:24
#1 0x00401757 in main () at test.cpp:36
(gdb)

test 85: (this one seems better)
(gdb) run
Starting program: C:\python-stuff\shedskin-0.0.1/test.exe

Program received signal SIGSEGV, Segmentation fault.
0x00419d59 in __gnu_cxx::__normal_iterator<__shedskin::str**, std::vector<__shedskin::str*, gc_allocator<__shedskin::str*> > >::__normal_iterator(__shedskin::st
r** const&) (this=0x22fe94, __i=@0x8)
at C:/mingw/bin/../lib/gcc/mingw32/3.4.2/../../../../include/c++/3.4.2/bits/stl_iterator.h:603
603 __normal_iterator(const _Iterator& __i) : _M_current(__i) { }
(gdb) bt
#0 0x00419d59 in __gnu_cxx::__normal_iterator<__shedskin::str**, std::vector<__shedskin::str*, gc_allocator<__shedskin::str*> > >::__normal_iterator(__shedskin::str** const&) (this=0x22fe94, __i=@0x8)
at C:/mingw/bin/../lib/gcc/mingw32/3.4.2/../../../../include/c++/3.4.2/bits/stl_iterator.h:603
#1 0x00445af8 in std::vector<__shedskin::str*, gc_allocator<__shedskin::str*> >::begin() (this=0x8)
at C:/mingw/bin/../lib/gcc/mingw32/3.4.2/../../../../include/c++/3.4.2/bits/stl_vector.h:314
#2 0x00445c3d in std::vector<__shedskin::str*, gc_allocator<__shedskin::str*> >::operator[](unsigned) (this=0x8, __n=1)
at C:/mingw/bin/../lib/gcc/mingw32/3.4.2/../../../../include/c++/3.4.2/bits/stl_vector.h:462
#3 0x00415a2e in __shedskin::list<__shedskin::str*>::__setitem__(int, __shedskin::str*) (this=0x0, i=1, e=0x0) at builtin_.hpp:703
#4 0x004016b5 in __test::__main() () at test.cpp:17
#5 0x00401927 in main () at test.cpp:34
(gdb) p __i
$1 = (struct str ** const&) @0x8: Cannot access memory at address 0x8
#1 0x00445af8 in std::vector<__shedskin::str*, gc_allocator<__shedskin::str*> >::begin() (this=0x8)
at C:/mingw/bin/../lib/gcc/mingw32/3.4.2/../../../../include/c++/3.4.2/bits/stl_vector.h:314
314 begin() { return iterator (this->_M_impl._M_start); }
(gdb) p this
$2 = (vector<__shedskin::str*,gc_allocator<__shedskin::str*> > * const) 0x8
(gdb) up
#2 0x00445c3d in std::vector<__shedskin::str*, gc_allocator<__shedskin::str*> >::operator[](unsigned) (this=0x8, __n=1)
at C:/mingw/bin/../lib/gcc/mingw32/3.4.2/../../../../include/c++/3.4.2/bits/stl_vector.h:462
462 operator[](size_type __n) { return *(begin() + __n); }
(gdb) up
#3 0x00415a2e in __shedskin::list<__shedskin::str*>::__setitem__(int, __shedskin::str*) (this=0x0, i=1, e=0x0) at builtin_.hpp:703
703 units[i] = e;
(gdb) p e
$3 = (struct str *) 0x0
(gdb) p i
$4 = 1
(gdb)

srepmub said...

thanks baczek!

I followed up khalid's instructions and noted the same two failing unit tests. I have fixed both, and hope to release an easy-to-install Windows installation package for Shed Skin before long. Today I'm going to optimize the process for OSX..


anyway, thanks!
mark.

Unknown said...
This comment has been removed by a blog administrator.
Valery said...

I was looking for skin type advice:) But it was an interesting reading:)

Rajeshwar Pilgulwar said...

is there any way to convert shell scripts into python scripts?