Thursday, December 15, 2005

Shed Skin 0.0.5.9

I have just released Shed Skin 0.0.5.9. It's almost where I want it to be
for 0.0.6. What remains to be coded is some kind of connection to the
standard library (probably a simple one at first: working only for
'opaque handlers'). I also want to improve cases where ints and floats
are mixed, since this is quite common. Some major changes:

-basic exception handling

(support for custom ones, and for some builtins such as ValueError,
KeyError and AssertionError; this will allow for implementation of
iterator objects later on)

-some basic inheritance

(no multiple inheritance yet, or weird stuff; this enabled me to add
the 340-line pygmy raytracer to the test set. it becomes about 40
times faster here..)

-keyword arguments

(this has not been tested very well yet - let me know about any problems)

-many missing minor 'set' features, thanks to reports by bearophile

(it should be practically complete now. the whole set of builtins is
nearing completion :D time to start optimizing stuff..)

-many, many bugfixes again, again mostly thanks to bearophile

I added four new 'big' programs to the test set that (with some minor
modifications) work now: a tic-tac-toe player for arbitrary size
boards, a simple genetics algorithm, a linear algebra program and the
mentioned raytracer. Finally, the README has been improved again,
though it's still pretty bad. After the release of 0.0.6 I plan to
create a web site with performance comparisons between CPython, Psyco
and Shed Skin (perhaps even PyPy :P), because quite some interesting
programs compile now, and Shed Skin is usually a lot faster than Psyco, except when Python builtins are the bottleneck. I guess some more STL magic is required here..

Let the small code fragments that fail flowing!

Saturday, December 10, 2005

Shed Skin 0.0.6 Status Update

Just a small note to anyone interested, that I am hard at work to get Shed Skin version 0.0.6 ready before the end of the year. It already supports an almost unmodified OO raytracer of about 350 lines (pygmy, with a speedup of about 40 on this sempron 2200+), as well as some other larger new programs. Simple inheritance has been improved by a lot. I am currently working on exception handling, which will probably be supported well in 0.0.6. Thanks to bearophile, a huge amount of bugs have additionally been
fixed. I will probably release a 0.0.5.8 or 9 within the coming week.

Please let me know if you have interesting use cases of inheritance and/or exceptions, and I would be glad to look into any problems. In that case, please ask me for the latest version of the compiler first (I have no access to CVS currently.)

Btw, I will probably be working on some method of accessing much not too weird standard library functionality soon, but I'm not sure whether I will manage to get this working well before 0.0.6.

Wednesday, November 30, 2005

Shed Skin 0.0.5.1 Released

I have just released Shed Skin 0.0.5.1. It contains many small bug
fixes, but more importantly it enables GC again (argh! :P) This should
make some of bearo's tests run a lot faster. As per his suggestion, I
also copy-pasted float hashing from CPython, which should make his
dict_speed test run much faster. I also made some improvements for
programs consisting of multiple modules. See test 139 for a partial
overview of the smaller bug fixes.

As usual, see shedskin.sourceforge.net for downloading SS.

Monday, November 07, 2005

0.0.5: Major Bugfix Release

Hi all,

I have just released Shed Skin 0.0.5. It fixes many bugs and adds many minor features to the Python builtins, most notably, the 'set' class. There have also been some optimizations on the C++ side. Finally, the README now better explains the compiler's limitations, and a TODO file has been added containing open bugs.

I would like to invite anyone to try out this new version, especially if there were problems with 0.0.4. If you encounter a bug or missing feature, please send me a small code fragment that exhibits the problem, so I can fix it or add it to the TODO. If you are a C++ programmer, please consider helping out on the C++ side, by sending in patches to improve the C++ implementation of the Python builtins!

For 0.0.6 (a feature release!), the following things are on my list:

- Basic inheritance support
- Automating the connection to the Python standard library
- Better error messages for things SS doesn't handle (Finally?!)

(This release has taken way too long, because I did not have much time this month. 0.0.6 might also take a while, since I will be living in China for the next seven weeks and will be writing my thesis, among other things. I expect to release it a few days after I get back, which would be just before new year.)

Thursday, October 13, 2005

Two weeks off

There are lots of fixes in the pipeline for Shed Skin 0.0.5. Unfortunately, I somehow managed to give away my PC, and I planned some sort of vacation for the next two weeks. So it's going to be very hard to release a new version anytime soon. I'm still very motivated about the compiler, and there's a whole list of things I'm dying to fix, but it's hard to do without my own computer.. :)

Things planned for the next release:

- lots and lots of bug fixes for the builtin types
- several optimizations in the resulting C++ code (e.g. for indexing)
- error messages for anything the compiler doesn't support
- builtin set type (in addition to sets.Set), plus most of its methods

Sunday, September 18, 2005

Shed Skin 0.0.2: Easy Windows/OSX Installation

Shed Skin 0.0.2 is up on SourceForge. It should install easily on Windows 2000/XP and on OSX. Please give it a try and let me know if there are still some problems.

If you would like to help me improve Shed Skin, please send me small code snippets, preferrably extracted from real-life use cases, that the compiler has problems with.

Thanks to everyone who has helped me out, especially Khalid and Luis on python-list, and Denis de Leeuw Duarte right here in the street :-)

Update: 0.0.3 has also been released, with improved support for builtin functions, bug fixes and a Windows package of only 3 MB.

Update: We're on a roll, with 0.0.4. Many improvements again; added several interesting test cases (thanks to bearophile!): convex hull, n-queens problem, pascal triangle and ascii mandelbrot.

Saturday, September 10, 2005

Announcement

First release of Shed Skin, a Python-to-C++ compiler.


After nine months of hard work, I am proud to introduce my baby to the world: an experimental Python-to-C++ compiler. It can convert many Python programs into optimized C++ code, without any user intervention such as adding type declarations. It uses rather advanced static type inference techniques to deduce type information by itself. In addition, it determines whether deduced types may be parameterized, and if so, it generates corresponding C++ generics. Based on deduced type information, it also attempts to convert heap allocation into stack and static preallocation (falling back to libgc in case this
fails.)

The compiler was motivated by the belief that in many cases it should be possible to automatically deduce C++ versions of Python programs, enabling users to enjoy both the productivity of Python and the efficiency of C++. It works best for Python programs written in a relatively static C++-style, in essence enabling users to specify C++ programs at a higher level.

At the moment the compiler correctly handles 124 unit tests, six of which are serious programs of between 100 and 200 lines:

-an othello player
-two satisfiability solvers
-a japanese puzzle solver
-a sudoku solver
-a neural network simulator

Unfortunately I am just a single person, and much work remains to be done. At the moment, there are several limitations to the type of Python programs that the compiler accepts. Even so, there is enough of Python left to be able to remain highly productive in many cases. However, for most larger programs, there are probably some minor problems that need to be fixed first, and some external dependencies to be implemented/bridged in C++.

With this initial release, I hope to attract other people to help me locate remaining problems, help implement external dependencies, and in the end hopefully even to contribute to the compiler itself. I would be very happy to receive small programs that the compiler does or should be able to handle. If you are a C++ template wizard, and you would be interested in working on the C++ implementation of builtin types, I would also love to get in contact with you. Actually, I'd like to talk to anyone even slightly interested in the compiler, as this would be highly motivating to me.

The source code is available at the following site. Please check the README for simple installation/usage instructions. Let me know if you would like to create ebuild/debian packages.

Sourceforge site: http://shedskin.sourceforge.net
Shed Skin blog: http://shed-skin.blogspot.com

Should you reply to this mail, please also reply to me directly. Thanks!


Credits

Parts of the compiler have been sponsored by Google, via its Summer of Code program. I am very grateful to them for keeping me motivated during a difficult period. I am also grateful to the Python Software Foundation for chosing my project for the Summer of Code. Finally, I would like to thank my university advisor Koen Langendoen for guiding this project.


Details

The following describes in a bit more detail various aspects of the compiler. Before seriously using the compiler, please make sure to understand especially its limitations.

Main Features

-very precise, efficient static type inference (iterative object contour splitting, where each iteration performs the cartesian product algorithm)
-stack and static pre-allocation (libgc is used as a fall-back)
-support for list comprehensions, tuple assignments, anonymous funcs
-generation of arbitrarily complex class and function templates (even member templates, or generic, nested list comprehensions)
-binary tuples are internally analyzed
-some understanding of inheritance (e.g. list(dict/list) becomes list(iter(A)))
-hierarchical project support: generation of corresponding C++ hierarchy, including (nested) Makefiles; C++ namespaces
-annotation of source code with deduced types
-builtin classes, functions (enumerate, sum, min, max, range, zip..)
-polymorphic inline caches or virtual vars/calls (not well tested)
-always unbox scalars (compiler bails out with error if scalars are mixed with pointer types)
-full source code available under the MIT license

Main Limitations/TODO's

-Windows support (I don't have Windows, sorry)
-reflection (getattr, hasattr), dynamic inheritance, eval, ..
-mixing scalars with pointer types (e.g. int and None in a single variable)
-mixing unrelated types in single container instance variable other than tuple-2
-holding different types of objects in tuples with length >2; builtin 'zip' can only take 2 arguments.
-exceptions, generators, nested functions, operator overloading
-recursive types (e.g. a = []; a.append(a))
-expect some problems when mixing floats and ints together
-varargs (*x) are not very well supported; keyword args are not supported yet
-arbitrary-size arithmetic
-possible non-termination ('recursive customization', have not encountered it yet)
-profiling will be required for scaling to very large programs
-combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
-unboxing of small tuples (should form a nice speedup)
-foreign code has to be modeled and implemented/bridged in C++
-some builtins are not implemented yet, e.g. 'reduce' and 'map'

Friday, September 02, 2005

SoC: finished!

C++ is fantastic. It was very easy to integrate a garbage collector into the compiler: I only needed to add some simple code to a single header file, and it worked. I've also made grateful use of template specializations in redesigning the tuple implementation. In addition to these larger features, I also implemented a huge amount of smaller features over the last two weeks. Apart from several theoretical niceties, such as generation of complex generics, the compiler can now actually handle several relatively large programs as well. In all, I think I have reached all SoC milestones, and much more.

There are now 6 larger programs in the unit test set, written without any static compilation in mind (by friends, or by me, before I wanted to build a python compiler,) that the compiler now handles fine. In total they weigh in at 879 lines of Python code, which the compiler transforms in total into 1749 lines of C++ code (excluding header files, macro expansion, builtin functions, ..) These programs are:

-a simple satisfiability solver (110 lines, complicated list comprehensions)
-a japanese puzzle solver (180 lines, uses some real polymorphism)
-a more advanced satisfiability solver (170 lines)
-a not-too-bad othello player (123 lines, does much with binary tuples)
-a neural network simulator (95 lines, uses classes, badly polluted first iteration)
-a sudoku solver (177 lines)

The compiler generally takes less than 20 seconds to analyze each program on my old athlon XP 1800+. The speedup is in general larger than a factor of 10, often much more.

In fact, I would like to invite any adventurous soul at this point to try out the compiler, and send in anything that doesn't work but should (please, as small as possible), or anything non-trivial that does work (so I can add it to the test set.)

Note that there are currently several important limitations to the type of Python code that is supported:

-Of course, don't use reflection, eval or weird run-time polymorphism that cannot be easily mapped to C++.
-Do not import libraries. Only Set is currently supported. I am working on a simple system that will allow users to add C++ versions of libraries to the compiler.
-Do not mix None and integer types. The compiler will bail out with an error. Use zero or any other special value instead. Do not mix scalar with pointer types. Try to keep things uniform. This is usually more elegant anyway. However, it is often useful to mix types in binary tuples, and the compiler supports this.
-But do not use tuples with different element types with a length greater than 2. Binary tuples with differently types elements can also not be used in addition, etc. 'Zip' only supports two arguments.
-Declaring things before use may sometimes be necessary at this point. For example, classes. I hope to fix this soon.
-Using varargs and kwargs will probably end in tragedy.
-In general, program elegantly, like you would in C++, but without all the type declarations.
-The compiler has not been tested on programs larger than 250 lines.

I still need to fix a non-termination problem I encountered, but this won't occur for most programs under 250 lines. I also need to solve many smaller issues. However, the compiler should work pretty well for many programs. If you're interested, by all means, please try it out and let me know how it went. For any larger program, there are probably a few minor issues or missing builtin functions that I would gladly fix, if I have a use case.

The compiler is currently only available via CVS. I hope to release a packaged version in about a month. Please see the README with instructions on how to use it. In any case, it won't work under Windows. Let me know if you would like to debug this.. :-)

Saturday, August 20, 2005

Countdown

Alright! After almost 8 months of more than full-time hacking the end is in sight. Everything is looking great. I'm hoping to finish the final three features of my soc-todo-list today, except for integrating a GC, which I hope to do tomorrow. The three features are: support for nested tuple assignments, e.g. a,(b,c) = expr (this may also occur inside for loops/list comprehensions), bootstrapping some useful builtins (enumerate, zip, min, max) and supporting untyped constructors that do not flow to any instance variable assignments, e.g. in return [], we may want this to become a list(str), because for example there is a return ['u'] in the same function.

The last thing I would like to do is compile a simple neural network simulator that I wrote a while ago, as the final test case before a 0.1 release. It will require writing C++ versions of some external calls (e.g. math.e, random.random).

In all it seems there are no hard problems left as far as my aims go. Of course I could easily work another few years on the problem, supporting more and more flexible code, extending many features (e.g. internally typed ternary tuples..), work on extreme scalability (by profiling/maintaining object contours) etc. etc. For now I only intend to perform several long-overdue cleanups over the following months, as well as (hopefully) adding programs written by other people to the unit test set and fixing any problems they may have (again, as far as my aims go.)

Hmm, I just realized I practically haven't had a single day off over the last 8 months.. Maybe it's time for that vacation now..

Tuesday, August 09, 2005

Getting Close

It took me some time to implement everything that was required by the Othello player. What is nice about it is that it pushes many things a bit further again, relative to the other larger test programs. No large features were required, or any major bugs uncovered, however, but I did have to fix many small problems ('the devil is in the details'). For example, constructs such as 'a = b = c', '[..for.. for..]' and '(1,2) in '[(1,2)]' were not yet well supported, and conversion of None to 0-pointers did not work well with parametric code. In the end, the resulting C++ player now plays about 25 times faster than the Python version. Pre-allocating some 'constant' lists will bring this to around 50.

As I see it, there are now only 5 relatively major features still to be implemented to reach all milestones for the SoC. These are: iterative splitting of user classes (this might work already), more aggressive merging of object contours containing (object contours with) only simple types, generating method templates (C++ function templates inside class templates), integration with a garbage collector and fixing a problem I discovered in the static preallocation code. I think these should not take me more than 10 days to implement. Other than these, there are still many minor issues of course. Maybe (hopefully) I can spend the last 10 days mopping up some minor problems that will need attention before being able to do a decent release.

Sunday, August 07, 2005

Back On the Job

Okay, after a few short days taking off (alright, of writing other Python programs.. :P) and preparing for and meeting with my university advisor, I'm back on the job. Over the last few days, I tried to learn my girl Python, by showing her, step-by-step, how to write a not-too-bad Othello player in under 100 lines. I don't think she could write something like that herself now, but I had a lot of fun :-)

I guess the first thing I will do today is to try and get this Othello player to compile. Programs like this one are perfect test-cases for the compiler: they often require some minor feature additions, which is nice and gives a feeling of progress, and generally uncover a few minor bugs, which are easily traced for such small programs. It will also be a good finger exercise to get going again.. There is still quite some work to do.. !

Saturday, July 30, 2005

C++

C++ makes for a great target language, with its intricate generics and its plethora of other low-level features. We can both succinctly map a high-level language such as Python to its feature set, and leave further optimization to be handled by decades old, mature C++ compilers. In many ways, C++ is really as flexible as Python. The problem is that this flexibility is often hard to achieve, and often results in unreadable code. It's nicer I guess to have a compiler write the C++ code for you :-) Consider the following example:
def flatsie(iter):                
return [(b,a) for (a,b) in iter]

flatsie([(1,2.1)])
flatsie({(2,3.141): [1,2,3]})
flatsie({(11,4.1): None})
flatsie(((7.7,1),))
There are some interesting things going on here. If we would like to compile this example to C++, the list comprehension has to be compiled to a generic function. But as the list comprehension iterates over both lists/tuples and dictionaries, it is not immediately clear how to write this nicely in C++, because dict is implemented using two template variables, as opposed to one for list and tuple. C++ provides an elegant solution: make iterable classes inherit from a pyiter class with a single template variable, and in case of dict, map the right template variable to it. Finally, we would also like to statically preallocate the result of the list comprehension, but since this is a generic list, we cannot define it globally. C++ again has the solution: use the 'static' keyword inside the function template, to create a parameterizable 'global' variable.

As of today, this example compiles cleanly, without breaking any of the other 95 unit tests. As there are only two types of tuples involved, the iterative analysis nicely ends up with two tuple object contours. These tuples are also modeled internally, or the list comprehensions would cause precision loss. Tuples of length greater than two are not yet internally modeled, but this would be a straight-forward generalization. In order to get here, I had to implement generic list comprehensions, and as a result the template milestones now look almost completed. In any case, it's pretty cool to see the compiler output the following working C++ code :D

template [class A, class B] static inline list[tuple2[B, A] *
*list_comp_0(list[tuple2[B, A] *] *result,
pyiter[tuple2[A, B] *] *iter) {
A a;
B b;
tuple2[A, B] *__0;
result->clear();

FOR_IN(__0,iter)
TUPLE_ASSIGN2(a,b,__0,0);
result->append(new tuple2[B, A](b, a));
END_FOR_IN
return result;
}

int main() {
flatsie(new list[tuple2[int, double] *](2, new
tuple2[int, double](1, 2.1), new tuple2[int, double]
(2, 4.1)));
flatsie(new dict[tuple2[int, double] *, list[int] *]
(1, new tuple2[tuple2[int, double] *, list[int] *]
(new tuple2[int, double](2, 3.1),new list[int]
(3, 1, 2, 3))));
flatsie(new dict[tuple2[int, double] *, none *](1,
new tuple2[tuple2[int, double] *, none *](new tuple2
[int, double](1, 4.1),0)));
flatsie(new tuple[tuple2[double, int] *](1, new tuple2
[double, int](7.7, 1)));
}

template [class A, class B] list[tuple2[B, A] *] *flatsie
(pyiter[tuple2[A, B] *] *iter) {
static list[tuple2[B, A] *] __1;

return list_comp_0(&__1, iter);
}

Friday, July 29, 2005

Iterative Progress

After messing around for a few days getting CVS write access from my mother's XP box (which I finally gave up) and fixing problems caused by differences in GCC versions across three operating systems, I finally got to hacking on the iterative analysis again. I really needed that, since debugging unit tests for more than a week really drained me.

First, I fixed a problem where object contours were merged that could possibly be split later on. The fix logically results in much fewer iterations on the three largest unit tests, while more contours are now created. Additionally, I haven't observed a single unit test failing afterwards, where I would get a seldom failure on test 34 or 66 previously (the analysis is not deterministic.)

Currently, only object contours that contain just monomorphic, atomic instance variables are merged, because these can not cause the merged contour to be split later on. This means that as tighter bounds are deduced during the analysis, more object contours can be merged. It seems that other object contours referencing only such contours can also be merged, because they now also cannot be split later on, i.o.w. mergability looks like a transitive property. I will have to investigate this later, as I don't want CPA to blow up in any way.

Second, I commenced work on other classes than list. Supporting singularly-typed tuples was a breeze. It was also relatively straight-forward to support multiple instance variables: we have to iterate over multiple variables per contour now, and be able to split on each of them; merging decisions have to be made over the entire instance variable state of object contours. Splitting dictionaries seems to work well now. I guess the biggest problem was how to transition everything so that all unit tests still run.. making incremental changes and running all unit tests each time worked pretty well, but took some time (having to wait 5 minutes after each small change can get annoying.)

Next up is supporting internally types tuples again. This was not possible before multiple instance variables were supported. I will have to consider these separately from regular tuples somehow. I still also have to look into internally types tuples with length greater than two, but this will probably have to wait some more.

A problem I did not give much attention before, because I was mostly concerned with lists until now, is that in general the instance variables of a class are unknown! Since the first iteration is so imprecise, classes may end up with many instance variables that will vanish again when the analysis becomes more precise. I think there is not much we can do about splitting on such instance variables, except to assume the instance variables of built-in types to be known (as I do now.) Of course, this break Python semantics.

It remains an interesting approach to perform profiling in order to get good initial object contours (i.o.w., perhaps this could keep us closer to Python semantics.) Something I hadn't thought of myself yet, but Plevyak mentioned, is that it's also possible to maintain information between different runs of the iterative analysis. If the analysis is now run repeatedly during development, the difficulty of analyzing the program can be spread out over these runs. I don't think I will get this far during the SoC, but it sounds plausible that for much larger test examples, such an approach could make using the compiler much more practical.

Sunday, July 24, 2005

No debugging for a few days!

Even though debugging is a much bigger challenge than writing crap programs, I think I prefer writing crap programs. Luckily I now have all 88 unit tests working, except for a seldom failure of test 34 or 66. Test 66 is not so big, so I should be able to locate that problem some time, and hopefully that will fix 34 too :P

Emotionally, this concludes the first milestone that my mentors and I had drafted at the start of the SoC. It seems like it took a long time, but in the mean time I am pretty far with the template generation (milestone three and four), and the second milestone seems much easier now that I have worked on the first. The last milestone, finally (integration with a GC) should not take too much effort. So I am pretty positive at this point about reaching all milestones.

The first milestone required getting 'basic list examples' to work with the iterative analysis, but in the mean time it works (most of the time) for all the unit tests (which contain some list-intensive 100+ line programs). The second milestone (considering other classes than list, and with multiple polymorphic instance variables, among other things) now seems to be a straight-forward generalization, although I may be overlooking some terrible problem of course.

I will be moving to my mother's place tomorrow for some time, and work on the second milestone there. I will bring some unboxing papers, and see if they contain something I might implement after the SoC.

Monday, July 18, 2005

Cleaning Up

When something just doesn't want to work, it's often a good solution to throw it away. Of course, Python makes this easier than many other languages. A famous quote of Tim Peters springs to mind: 'Python code is easy to throw away.' Additionally, throwing away crappy code can make you feel much better, since your program often becomes cleaner because of it, and you can always get back to the problem later..

Today I commented out the partial unboxing analysis that I had, as well the analysis that tried to work out upper bounds on strings used in reflection. Because now the data flow analysis does not have to consider the allocation sites of simple types anymore, a lot of cruft could be removed from the compiler. I will probably come back to the unboxing analysis after the SoC. The string bounds analysis was intended to help bootstrapping the compiler, but since this will not happen any time soon, I'm not sure if I'll ever revive it.

I intend to spend the following three days or so on getting as much unit tests working again as possible. After that, I will start work on the second milestone. I'm looking forward to that, as it will hopefully be more interesting than fixing bugs all day.. :-)

Sunday, July 17, 2005

Unboxing

I haven't posted here for a few days, as I was modifying my old code for unboxing of integers and floats, and I did not add any new features to the iterative analysis. I did fix a number of old unit tests to compile again with the latest code, but other than that there was nothing noteworthy.

It turns out that unboxing is not as simple as I had assumed. Actually, I never thought about it that much, because the other techniques got most of my attention - so much to do, and so little time. I decided to look into unboxing in depth, because of some undesirable interaction with the iterative analysis (e.g. having to use separate contours for lists with boxed and lists with unboxed integers.)

In the end, I concluded that creating good unboxing support is probably going to take more than just a few days, and, as I don't really have the time for this at the moment, I decided that I will disable the partial unboxing analysis that I have for now. That does not mean no unboxing can be performed: all integers and floats will be unboxed, and a warning will be given whenever they are confused with a pointer type. So for now, it will be the programmer's task not to add such confusions. Templates etc. will still work.

What will not work for now, then, are container objects with integers/floats and pointer types in the same slot, and passing integers/floats into arguments with a pointer type default (such as range(a,b=None) - we will have to provide two builtin C++ functions for this case.) As the latter can't be supported efficiently anyway without generating multiple function duplicates with different defaults, and mixing integers and pointer types for example in a list container is often not a very nice programming style, I feel the compiler could still be pretty useful.

It's always a bad feeling to have to add restrictions to the types of programs the compiler can handle, but hopefully this one will be temporary. In any case, it can still be used to compile programs written in a C++-style to C++. Although this might not sound very impressive, being able to use Python syntax in writing highly efficient programs would still be pretty cool :D

Thursday, July 14, 2005

Final Piece of the Puzzle?

Having to use 0CFA - in other words, no duplication at all of either class types or functions - during the first iteration will become terribly slow for larger programs, but worse, being so imprecise it makes life hard for IFA, causing it to have to split huge contours (consider a single contour for list for larger programs!), making it slow and requiring many iterations.

There are multiple interesting ways of combating this problem. We can selectively apply CPA during the first iterations, by limiting the sizes of cartesian products. Of course fully applying it would make the first iteration even more slow, but selectively applying it would probably not increase precision by much. I think what is really needed is to provide IFA with an initial 'guestimate' of the set of object contours that it should deduce. This does not work for just CPA, because CPA is not data-adaptive, and it would pollute itself if something 'unexpected' happens.

How to get an initial good set of object contours? Of course, heuristics can be applied. A lot of them, especially ugly ones. For example, a [1, 2] constructor could be grouped under a 'list(int)' contour. This becomes messy for more general, and polymorphic constructors - what to do with an [x, y] constructor?

I think a better, and more elegant, solution would be to perform profiling on the program that is to be compiled. Of course, if the program has unit tests, this would be a great match - but that's beside the point. It should be very simple to dump the actual types for each allocation site, possibly even relative to the actual argument types of the current function invocation. This can give us a great starting point. The dumps do not have to be perfect, because IFA will adjust to any imperfections. Best of all, we can enable CPA during all iterations, greatly improving IFA and reducing the number of iterations.

No, I don't mind that people will have to profile their applications. Especially if a simple technique like this turns out to greatly improve scalability.

BTW profiling has another advantage: it can provide type distributions at dynamic calling sites. In case of polymorphic inline caches, we can move the most used types up the ladder.

Wednesday, July 13, 2005

Bugs

Today was the first bad day I guess, since starting work on the iterative analysis. As with the generics support I worked on before that, it's not something that you can just paste on the compiler and have it work. Because both required rather intrusive modifications, I did not run the unit tests every day while working on them. The result is long debugging sessions, like today.

Spending the whole day fixing 5 or 6 stupid bugs, or 'semantical typos', gave me somewhat of a desparate feeling, partly because I still have 30 unit tests to fix (those that have broken anyway, no idea how many), and there are some larger programs among those. I think it could easily take another week to fix all tests, and make some more actual modifications to the current iterative analysis, that I would like to finish before considering the first milestone to be reached.

For example, it seems my unboxing technique is not going to work well with contours that can be merged.. I would also very much like to spend more time on the iterative analysis itself (istead of bugfixing :/), i.e. compacting contours together, better cleaning out of the allocation site table, working with lists that never flow to an assignment, and be able to merge non-singleton upper bounds together if it is clear they cannot be split later on.

Tuesday, July 12, 2005

CVS

I made several not too interesting changes today. I finally got around to fixing the [x,y] issue though. I also fixed some old unit tests again, and added some more, and spent some time hunting obscure bugs. I added some asserts which will help debug the iterative algorithm later on. Most interestingly though, I started working with CVS, and committed all my changes using this.

I had used CVS before, but I had already forgotten that it was just fun to use, and that it actually helps single-developer teams as well. Having a history can be very useful, especially for chaotic programmers like myself. Because it can be lonely though, using CVS by myself, I am going to force one of my mentors to submit at least one patch :-)

Today was sort of a grunt work day. My plan is to work on adding some new features later today or tomorrow (more aggressively combine contours; compact the contour table; rebuild the allocation site table so that it won't accumulate a mess), and then get all unit tests working, as well as add some more that I should've done before. When all this is done (I'm guestimating about a week), I consider my first milestone to have been reached, and I will start adding non-list classes to the mix, classes with multiple class variables, and be frustrated about how many new bugs this will cause :)

Monday, July 11, 2005

Controlled Explosions

Type inference is really a simple problem, with a simple solution. Just consider each allocation site separately, apply the cartesian product algorithm and you're done. You will have a simple implementation and almost perfect precision. Unfortunately, there are 'some' scaling issues of course.. Let's compare it to chess. Chess is also a simple problem, with a simple solution. Just calculate all possibilities over the next 100 moves and Kasparov will look like a beginner..

However, while there exist nice AI techniques to make simple computers play chess very well, as far as I know, the scalability problem of type inference methods has not yet been convincingly solved in the scientific literature. It might be just a matter of time, with promising techniques such as CPA and IFA. The chess problem, of course, has been researched over a much longer period of time, and by many more people, so we can be hopeful.

However, combining CPA and IFA will not be as straight-forward as I would have thought (ain't it ever.) By initially using a single contour for non-simple types, CPA doesn't blow up for these types, but because of the single contour many simple types will now become polluted when they interact with such 'overloaded' contours, again blowing up CPA. (Consider getting an int from an int list, then using it as a argument.) A solution might be to limit the size of cartesian products during the first iterations, or perhaps even to use 0CFA..

In any case, I guess my priority right now is to get everything working precisely and (hopefully) correctly. It has to _work_ first for larger programs, before I can fine-tune stuff and incorporate butt-ugly heuristics, as well as the neighbour's dog for improving scalability. I'm trying not to think about other scalability problems presented by larger programs, caused by inheritance and the proportionally increased usage of dynamic features (e.g. reflection).

In other words, I don't think I will be able to bootstrap the compiler any time soon (picture a snake with his tail in his mouth.. do snakes have tails??). I actually thought it would be possible to do during my Master's project. Then again, maybe I would have chosen something UML'ish if I had known the difficulties. IMO, it would be great, however, to actually _have_ an open source engine that is precise and works for not-too-large Python programs (say up to around a few thousand lines), that are programmed in a C++-like style.

Sunday, July 10, 2005

Small features, bugfixes

Today I worked all over the map. Maybe the most interesting aspects are that nested types and truly polymorphic containers don't crash the compiler anymore.. :-) They actually seem to work. The following example compiles and runs fine for example:
def duplll(x):             # x: [list(A)]
return [x] # [list(list(A))]

a = duplll([1]) # [list(list(int))]
b = duplll([1.0]) # [list(list(float))]

def ident(x): # x: [list(list(A))]r
return x # [list(list(A))]
def meuk(x): # x: [list(list(A))]
return ident(x) # [list(list(A))]

c = meuk(a) # [list(list(int))]
d = meuk(b) # [list(list(float))]

def makel(x): # x: [list(A)]
return [x] # [list(list(A))]

def dupl(x): # x: [list(list(A))]
return [makel(x[0])] # [list(list(list(A)))]

dupl([[1]]) # [list(list(list(int)))]

y = [[('1',)]] # [list(list(tuple(str)))]
dupl(y) # [list(list(list(tuple(str))))]

d = [[1]] # [list(list(int))]
d = [[1.0]] # [list(list(float))]
d # [list(list(pyobj))]
It contains a polymorphic call chain ('meuk' and 'ident'), nested (in the first iteration even recursive) types and a truly polymorphic container in the end. It's nice to see all the C++ crap that is generated just for the last assignment, especially considering that in Python often much more involved datastructures are used than a list of lists of floats (I'll show some nice examples later on :-))
d = ((list[list[pyobj *] *] *)(new list[list[double] *>(1,
new list[double](1, new double(1.0)))));
The cast is necessary because C++ won't (rightly) accept the assignment otherwise, and the float is allocated on the heap, because that's how I currently represent 'confused' objects (it can't be unboxed, because floats and integers are confused inside the container.) I believe it's a nice example of how static type checking gets in the way of programming. I dare not show the entire generated C++ program :-)

Other than the above, I fixed many small bugs, rediscovered my affinity for unit testing, made some old unit tests work again and improved the splitting at confluence points (by actually looking at which creation sites come in via the data flow edges - for example, the two edges may provide the same sets, in which case splitting is not very useful.)

Still, there is so much more to do it makes me feel a bit discouraged at the moment. I'll just have to take it feature by feature and bug by bug at a time.. sometimes I feel like the guy in 'Touching the Void'. Watch the movie and you'll know what I mean :-)

Saturday, July 09, 2005

Integration Blues

Okay, so it took me most of the day to be able to support '[x]'. And it still doesn't work perfectly, for example '[x,y]' will require more work. At least I know where the problems lie now, and '[x]' suffices for debugging the iterative analysis. This is because using '[]' several times makes these allocation sites looks alike, obfuscating debug output. The problem was that I did not anticipate object contours to change during analysis, so in the end I just had to make it a bit more dynamic..

With debugging now much easier, it actually became fun to debug some more problems I found in the iterative analysis. To make debugging even more fun, I picked up my habit again of adding unit tests for every new piece of code that I wrote, especially those causing the tiniest of problems. At the moment, no known bugs remain, just.. let's call them 'missing features'.

I guess tomorrow I will start messing around with finding some corner cases, and see how the algorithm works for truly polymorphic containers (when an instance variable of a container object actually takes on different types during run-time). Of course throwing ints and floats into a single list doesn't have to be very efficiently supported (there are probably cases where it's useful, but I haven't seen them). However, in case of inheritance hierarchies, it would be nice if lists with objects of classes from the same hierarchy don't blow up the analysis.. :)

Friday, July 08, 2005

Scaling up

I spent most of today debugging silly problems, that started to appear as I was feeding the compiler with larger examples. In a codebase of about 5000 lines simple things can become hard to find. It looks like I found the last of them, as I can't find a smallish 'list' example that doesn't work anymore. For example, an extended version of the example in my previous posting, now with 15 allocation sites and several functions (ident, dupl, makel) nicely converges to these 2 object contours:

1: ImmutableSet([])
2: ImmutableSet([class int_])
3: ImmutableSet([class float_])
4: ImmutableSet([])
5: ImmutableSet([])
unused [4, 5, 1]

As can be seen, in total 5 intermediate object contours are created, as the compiler splits and merges contours during each iteration. Although some steps of the iterative analysis as I've implemented it are slow (for example, by checking whether a tuple of tuples is in some Set), the theoretical complexity of the whole is greatly reduced, as CPA will create a boatload less templates when there are fewer object contours!

My next step will be to clean this stuff up some more, e.g. remove unused contours after each iteration, fix the fact that '[x]' is not supported, and start looking at other classes than list. At some point I will also have to start looking at nested and recursive types, and start thinking about termination issues.. shudder!!

Thursday, July 07, 2005

John Plevyak

Since John Plevyak wrote his PhD thesis almost ten years ago, I decided to ask him how he views the state of the solutions for handling data polymorphism today. Unexpectedly, I received a detailed answer within minutes. It appears he is still (or again) working on his iterative method, and is actually planning to release an open source inference engine at some point in the future. How cool! :D

Administrative Issues

I spent the day working on various administrative issues. Since allocation sites inside function templates are 'thrown' away after each CPA iteration, we need to record previously deduced type information for them, which have to be inserted again if similar templates are created in the next iteration. This becomes quite some administration, because these allocation may be located in templates created for argument types whose contours may be split or merged in between iterations.

First, each allocation site is given an (AST node, argument product) identifier. Based on this, a (currently global) table is maintained with currently deduced type information for each allocation site, and updated as contours change. The following more complicated example code is now analyzed correctly. The allocation sites inside 'dupl' templates are dependent on the 'y' argument, whose contour at some point is split. It goes too far to explain all this in detail, so here is the code:

def ident(x):                            # x: [list(A)]r
return x # [list(A)]
a = [] # [list(int)]
a = [] # [list(int)]
a = [] # [list(int)]
b = [] # [list(float)]

ident(a).append(1) # []
ident(b).append(1.0) # []

def dupl(y): # y: [list(A)]
k = [] # [list(float)]
k.append(1.0) # []
l = [] # [list(A)]
l.append(y[0]) # []
return l # [list(A)]

dupl(a) # [list(int)]
dupl(b) # [list(float)]

All lists in the example contain either integers or floats, so we would expect the analysis to end up with two list contours, one for each type. However, the compiler does not merge contours yet that end up with the same contents. Clearly, it can happen that when a contour becomes more precise because of CPA, there already is another contour available with the same precision. Of course then, these contours should be merged, because sharing of contours is what this whole iterative analysis is about..

In the previous situation, six object contours would have been created (one for each syntactic allocation site), potentially blowing up CPA (as it works with the cross-product of actual arguments). Additionally, the types for 'dupl(a)' and 'dupl(b)' would have been imprecise, because a single contour would be used for the different incarnations of 'l' in the different templates for 'dupl'.. It's already nice to see both improved precision and efficiency, although I still have much to do, and probably especially to debug.. :P

Wednesday, July 06, 2005

This just in: first results

After building some further infrastructure for the iterative analysis today, and sanitizing other parts of the compiler to cooperate with it, I have the first simple example compiling ('return [x]' is not supported for now :-)):
def makel(x)
l = []
l.append(x)
return l
print makel(1.0)
print makel(1)

During the first iteration, CPA creates two method templates for 'makel', because it sees that it is called with exactly two different argument products: (float,) and (int,). Since we want to iteratively improve upon imprecise object contours, during this first CPA iteration a single object contour is used for the duplicated allocation sites '[]' inside 'makel' (one for each template.) Consequently, an upper bound of {int, float} is inferred for this object contour.

As I explained earlier, after each iteration, the iterative algorithm tries to resolve such imprecisions by splitting object contours. But splitting is not yet necessary for this example to work. By tracing back dataflow paths from inside the two 'append' templates (conflicting assignments to an instance variable), we can conclude that each '[]' allocation site flows only to one of these. The compiler does just that, and splits the object contour in two. Further, it records the newly gained information about these allocation sites in a global table (still messy atm), so that future CPA iterations can seed these allocation sites with the right object contours as they create their templates.

After one more iteration, the compiler concludes that nothing more can change, and generates the following code (nicely parameterizing the types as well, corresponding to another part of my SoC proposal - btw I renamed template brackets to [ and ] because messing with HTML apparently makes me very sad):
int main() {
printf("%s\n", CS(makel(1.0)));
printf("%s\n", CS(makel(1)));
}

template [class A] list[A] *makel(A x) {
list[A] *l;
l = new list[A]();
l->append(x);
return l;
}

Which compiles and runs just fine. Simple object contour splitting also works already (see two postings earlier):
def ident(x):
return x
a = []
a = []
b = []
ident(a).append(1)
ident(b).append(1.0)

After the first iteration, the compiler notes the conflicting assignments, and sees that multiple creation points 'flow' to both assignments. It tries to 'resolve' this problem by finding a confluence node (in our case, 'x'), and by splitting the respective object contour along the incoming edges of this node. Here, it nicely splits the 'mother' contour into a contour for the two creation points for 'a' and a contour for the creation point for 'b'. In the next iteration, CPA splits the 'ident' function, because a and b now have different object contours, and the imprecision is gone :-)

Appointments

After a week of becoming increasingly frustrated by all sorts of daily appointments, I am finally free to work on my compiler for the rest of july and august! :D I hope to divert all potential appointments to some later date, except for my motor cycle lessons, which I just didn't want to quit.. This tax thing looks like it will be needing some attention too.. non-US SoC'ers need to find like 20 different forms, fill them in correctly, and send to different places, in order to avoid losing 30% of the total grant. Problem is, my brain immediately goes in stand-by mode each time I start to think about it..

In all, I guess there should be enough time to build something nice to handle many types of data polymorphism. Two months is a lot. I am a bit overwhelmed though, by the complexity of both the problem and Plevyak's solution. It's not going to be easy..

BTW I made my first release. It's highly unstable, but it's there I guess ;) I will stabilize it later, when datapolymorphism is handled better (at all?). It can be downloaded via the Sourceforge link. How to use it? Edit test.py and add some Python code, or use the hello world example it contains; run python ss.py - this will create two C++ files, test.?pp as well as a Makefile - then make and either make run or ./test. The file test.py (and imported module files) will be annotated with the deduced types. (Not always correctly though, as the annotation code is not perfect.) All the code in unit.py has run well at some point (from the same codebase), but at this moment I guess many of the tests will fail. Okay, back to work..

Friday, July 01, 2005

The Problem of Data Polymorphism

Parametric polymorphism refers to the ability of functions to be called with different types of arguments in different contexts. The cartesian product algorithm, or CPA, efficiently separates these contexts from each other, by creating templates for each function, based on the cartesian product of the actual arguments provided at each call site. By connecting each context with certain templates during analysis, these templates do not pollute each other with bogus types. The idea behind CPA is that when different call sites provide similar argument types, these templates can be shared. This makes for a very efficient analysis, and also a very precise one, as it is not limited to a certain call-chain length, as earlier analyses such as NCFA.

However, CPA only considers 'simple' types, such as integers and floats. Because integers and floats always have the same 'behaviour', these types may be merged together during dataflow analysis (i.e. when one integer meets another integer). CPA templates can be created eagerly for these types, as everything is known about them at each call site where they pop up - all integers are the same, so a template made for one integer will be valid and precise for all integers that pass by. For polymorphic containers such as 'list' the story becomes more complicated. Data polymorphism refers to the ability of instance variables to contain different types. Different such types may require different templates to be made. For example, the return type of some function may depend on an instance variable of an argument to that function. Unfortunately, instance variable types cannot be fully known at their allocation site! Consider the following simple example:
def ident(x):
return x
a = []
a = []
b = []
ident(a).append(1)
ident(b).append(1.0)

Do we create one or two templates for the 'ident' function? As the list contents may not be known as these lists reach the function calls, we cannot eagerly create multiple templates. We cannot merge lists either during dataflow analysis, and then split them later, when we find out more about their contents, as this could leave the dataflow network in an inconsistent state.

A naive solution, called 1CFA, is to create separate 'class templates' for each allocation site, in effect giving each allocation site its own type. This slows down the process, and is still not precise, because allocation sites may be duplicated as part of templates, again giving rise to possibly differently used polymorphic containers. Splitting these duplicated allocation sites would quickly make for an intractable computation.

A better solution adapts itself to the actual data polymorphism used in the program (the way CPA adapts itself to the actual parametric polymorphism), so that as many templates as possible can be shared. One solution, already alluded to by Olin Shivers (of NCFA fame), is to iterate the analysis, starting from a very imprecise situation, and trying to improve type information after each iteration. John Plevyak found a practical method to implement this idea. Its first iteration is just 0CFA, which means that all similar types are merged, e.g. all lists. By looking at the dataflow graph in a backward fashion, starting at conflicting assignments to a single instance variable, it is possible to locate so-called 'imprecision points'. In the above example, if we start inside the two append templates (CPA creates one for integer and one for float cases), and we look backwards, we find that there flow types from different allocation sites into the formal argument of 'ident'. By splitting the list type in two, one for each incoming dataflow edge at this imprecision point, the respective imprecision can be 'removed', because CPA will now create different templates for 'ident' during the next iteration! By repeatedly splitting types, based on imprecision points, Plevyak's analysis 'adapts' itself to the data polymorphism in the program. Heuristics are probably needed for splitting decisions, because splitting too little means many iterations, and splitting too much means CPA might blow up.

Actually, when Plevyak developed the above method, CPA did not yet exist. His approach also encompassed means of handling parametric polymorphism precisely, but reportedly it was not very efficient. As the developer of CPA alluded to in his PhD thesis, it would be nice to try and combine Plevyak's data-iterative approach with CPA. Over the following weeks, I will try to implement such a combination, although I am a bit wary about potential problems I might encounter. As the first iteration will be just 0CFA, there will be almost no precision - how to select imprecision points that, when split, will actually help? What if assignment types are not singleton sets, or what if there are many different combinations of assignment types? What about inheritance? How do we maintain information about duplicated allocation sites between iterations? How do we terminate efficiently in the face of many polymorphic containers? I guess I will have to find out..

Wednesday, June 29, 2005

Summer of Code

In yet another amazing move, Google has decided to pay 410 students to work on open source software during summer. These students have been selected via 'mentoring organisations', which will guide them through the program, as well as judge their results. Hopefully the program will be a resounding success, and Google will be able to repeat it next year, or even set a new trend!

I was very fortunate to be working on something interesting to one of these organisations (the Python Software Foundation, or PSF) as part of my Master's Thesis project, and that I was just planning on adding several large features this summer. Of course, I would have worked on them anyway, but morale was going down, and receiving a google t-shirt is excellent motivation! (The money comes in handy too..) I am very glad to have been selected.

My mentors will be Jeremy Hylton and Brett Cannon, both well known in the Python community and working for Google at the moment. I look forward to interacting with them during the program, and hopefully learning a lot from them. On the university side, my advisor has been and will be Koen Langendoen, co-author of the book 'Modern Compiler Design'.

During the program, I will try to post an update on my progress and experiences at least once every few days. Thanks for reading, and I appreciate any comments!