Shed Skin - A (restricted) Python-to-C++ Compiler

Thursday, December 15, 2005

Shed Skin 0.0.5.9

I have just released Shed Skin 0.0.5.9. It's almost where I want it to be
for 0.0.6. What remains to be coded is some kind of connection to the
standard library (probably a simple one at first: working only for
'opaque handlers'). I also want to improve cases where ints and floats
are mixed, since this is quite common. Some major changes:

-basic exception handling

(support for custom ones, and for some builtins such as ValueError,
KeyError and AssertionError; this will allow for implementation of
iterator objects later on)

-some basic inheritance

(no multiple inheritance yet, or weird stuff; this enabled me to add
the 340-line pygmy raytracer to the test set. it becomes about 40
times faster here..)

-keyword arguments

(this has not been tested very well yet - let me know about any problems)

-many missing minor 'set' features, thanks to reports by bearophile

(it should be practically complete now. the whole set of builtins is
nearing completion :D time to start optimizing stuff..)

-many, many bugfixes again, again mostly thanks to bearophile

I added four new 'big' programs to the test set that (with some minor
modifications) work now: a tic-tac-toe player for arbitrary size
boards, a simple genetics algorithm, a linear algebra program and the
mentioned raytracer. Finally, the README has been improved again,
though it's still pretty bad. After the release of 0.0.6 I plan to
create a web site with performance comparisons between CPython, Psyco
and Shed Skin (perhaps even PyPy :P), because quite some interesting
programs compile now, and Shed Skin is usually a lot faster than Psyco, except when Python builtins are the bottleneck. I guess some more STL magic is required here..

Let the small code fragments that fail flowing!

Saturday, December 10, 2005

Shed Skin 0.0.6 Status Update

Just a small note to anyone interested, that I am hard at work to get Shed Skin version 0.0.6 ready before the end of the year. It already supports an almost unmodified OO raytracer of about 350 lines (pygmy, with a speedup of about 40 on this sempron 2200+), as well as some other larger new programs. Simple inheritance has been improved by a lot. I am currently working on exception handling, which will probably be supported well in 0.0.6. Thanks to bearophile, a huge amount of bugs have additionally been
fixed. I will probably release a 0.0.5.8 or 9 within the coming week.

Please let me know if you have interesting use cases of inheritance and/or exceptions, and I would be glad to look into any problems. In that case, please ask me for the latest version of the compiler first (I have no access to CVS currently.)

Btw, I will probably be working on some method of accessing much not too weird standard library functionality soon, but I'm not sure whether I will manage to get this working well before 0.0.6.

Wednesday, November 30, 2005

Shed Skin 0.0.5.1 Released

I have just released Shed Skin 0.0.5.1. It contains many small bug
fixes, but more importantly it enables GC again (argh! :P) This should
make some of bearo's tests run a lot faster. As per his suggestion, I
also copy-pasted float hashing from CPython, which should make his
dict_speed test run much faster. I also made some improvements for
programs consisting of multiple modules. See test 139 for a partial
overview of the smaller bug fixes.

As usual, see shedskin.sourceforge.net for downloading SS.

Monday, November 07, 2005

0.0.5: Major Bugfix Release

Hi all,

I have just released Shed Skin 0.0.5. It fixes many bugs and adds many minor features to the Python builtins, most notably, the 'set' class. There have also been some optimizations on the C++ side. Finally, the README now better explains the compiler's limitations, and a TODO file has been added containing open bugs.

I would like to invite anyone to try out this new version, especially if there were problems with 0.0.4. If you encounter a bug or missing feature, please send me a small code fragment that exhibits the problem, so I can fix it or add it to the TODO. If you are a C++ programmer, please consider helping out on the C++ side, by sending in patches to improve the C++ implementation of the Python builtins!

For 0.0.6 (a feature release!), the following things are on my list:

- Basic inheritance support
- Automating the connection to the Python standard library
- Better error messages for things SS doesn't handle (Finally?!)

(This release has taken way too long, because I did not have much time this month. 0.0.6 might also take a while, since I will be living in China for the next seven weeks and will be writing my thesis, among other things. I expect to release it a few days after I get back, which would be just before new year.)

Thursday, October 13, 2005

Two weeks off

There are lots of fixes in the pipeline for Shed Skin 0.0.5. Unfortunately, I somehow managed to give away my PC, and I planned some sort of vacation for the next two weeks. So it's going to be very hard to release a new version anytime soon. I'm still very motivated about the compiler, and there's a whole list of things I'm dying to fix, but it's hard to do without my own computer.. :)

Things planned for the next release:

- lots and lots of bug fixes for the builtin types
- several optimizations in the resulting C++ code (e.g. for indexing)
- error messages for anything the compiler doesn't support
- builtin set type (in addition to sets.Set), plus most of its methods

Sunday, September 18, 2005

Shed Skin 0.0.2: Easy Windows/OSX Installation

Shed Skin 0.0.2 is up on SourceForge. It should install easily on Windows 2000/XP and on OSX. Please give it a try and let me know if there are still some problems.

If you would like to help me improve Shed Skin, please send me small code snippets, preferrably extracted from real-life use cases, that the compiler has problems with.

Thanks to everyone who has helped me out, especially Khalid and Luis on python-list, and Denis de Leeuw Duarte right here in the street :-)

Update: 0.0.3 has also been released, with improved support for builtin functions, bug fixes and a Windows package of only 3 MB.

Update: We're on a roll, with 0.0.4. Many improvements again; added several interesting test cases (thanks to bearophile!): convex hull, n-queens problem, pascal triangle and ascii mandelbrot.

Saturday, September 10, 2005

Announcement

First release of Shed Skin, a Python-to-C++ compiler.

After nine months of hard work, I am proud to introduce my baby to the world: an experimental Python-to-C++ compiler. It can convert many Python programs into optimized C++ code, without any user intervention such as adding type declarations. It uses rather advanced static type inference techniques to deduce type information by itself. In addition, it determines whether deduced types may be parameterized, and if so, it generates corresponding C++ generics. Based on deduced type information, it also attempts to convert heap allocation into stack and static preallocation (falling back to libgc in case this
fails.)

The compiler was motivated by the belief that in many cases it should be possible to automatically deduce C++ versions of Python programs, enabling users to enjoy both the productivity of Python and the efficiency of C++. It works best for Python programs written in a relatively static C++-style, in essence enabling users to specify C++ programs at a higher level.

At the moment the compiler correctly handles 124 unit tests, six of which are serious programs of between 100 and 200 lines:

-an othello player
-two satisfiability solvers
-a japanese puzzle solver
-a sudoku solver
-a neural network simulator

Unfortunately I am just a single person, and much work remains to be done. At the moment, there are several limitations to the type of Python programs that the compiler accepts. Even so, there is enough of Python left to be able to remain highly productive in many cases. However, for most larger programs, there are probably some minor problems that need to be fixed first, and some external dependencies to be implemented/bridged in C++.

With this initial release, I hope to attract other people to help me locate remaining problems, help implement external dependencies, and in the end hopefully even to contribute to the compiler itself. I would be very happy to receive small programs that the compiler does or should be able to handle. If you are a C++ template wizard, and you would be interested in working on the C++ implementation of builtin types, I would also love to get in contact with you. Actually, I'd like to talk to anyone even slightly interested in the compiler, as this would be highly motivating to me.

The source code is available at the following site. Please check the README for simple installation/usage instructions. Let me know if you would like to create ebuild/debian packages.

Sourceforge site: http://shedskin.sourceforge.net
Shed Skin blog: http://shed-skin.blogspot.com

Should you reply to this mail, please also reply to me directly. Thanks!

Credits

Parts of the compiler have been sponsored by Google, via its Summer of Code program. I am very grateful to them for keeping me motivated during a difficult period. I am also grateful to the Python Software Foundation for chosing my project for the Summer of Code. Finally, I would like to thank my university advisor Koen Langendoen for guiding this project.

Details

The following describes in a bit more detail various aspects of the compiler. Before seriously using the compiler, please make sure to understand especially its limitations.

Main Features

-very precise, efficient static type inference (iterative object contour splitting, where each iteration performs the cartesian product algorithm)
-stack and static pre-allocation (libgc is used as a fall-back)
-support for list comprehensions, tuple assignments, anonymous funcs
-generation of arbitrarily complex class and function templates (even member templates, or generic, nested list comprehensions)
-binary tuples are internally analyzed
-some understanding of inheritance (e.g. list(dict/list) becomes list(iter(A)))
-hierarchical project support: generation of corresponding C++ hierarchy, including (nested) Makefiles; C++ namespaces
-annotation of source code with deduced types
-builtin classes, functions (enumerate, sum, min, max, range, zip..)
-polymorphic inline caches or virtual vars/calls (not well tested)
-always unbox scalars (compiler bails out with error if scalars are mixed with pointer types)
-full source code available under the MIT license

Main Limitations/TODO's

-Windows support (I don't have Windows, sorry)
-reflection (getattr, hasattr), dynamic inheritance, eval, ..
-mixing scalars with pointer types (e.g. int and None in a single variable)
-mixing unrelated types in single container instance variable other than tuple-2
-holding different types of objects in tuples with length >2; builtin 'zip' can only take 2 arguments.
-exceptions, generators, nested functions, operator overloading
-recursive types (e.g. a = []; a.append(a))
-expect some problems when mixing floats and ints together
-varargs (*x) are not very well supported; keyword args are not supported yet
-arbitrary-size arithmetic
-possible non-termination ('recursive customization', have not encountered it yet)
-profiling will be required for scaling to very large programs
-combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
-unboxing of small tuples (should form a nice speedup)
-foreign code has to be modeled and implemented/bridged in C++
-some builtins are not implemented yet, e.g. 'reduce' and 'map'

Friday, September 02, 2005

SoC: finished!

C++ is fantastic. It was very easy to integrate a garbage collector into the compiler: I only needed to add some simple code to a single header file, and it worked. I've also made grateful use of template specializations in redesigning the tuple implementation. In addition to these larger features, I also implemented a huge amount of smaller features over the last two weeks. Apart from several theoretical niceties, such as generation of complex generics, the compiler can now actually handle several relatively large programs as well. In all, I think I have reached all SoC milestones, and much more.

There are now 6 larger programs in the unit test set, written without any static compilation in mind (by friends, or by me, before I wanted to build a python compiler,) that the compiler now handles fine. In total they weigh in at 879 lines of Python code, which the compiler transforms in total into 1749 lines of C++ code (excluding header files, macro expansion, builtin functions, ..) These programs are:

-a simple satisfiability solver (110 lines, complicated list comprehensions)
-a japanese puzzle solver (180 lines, uses some real polymorphism)
-a more advanced satisfiability solver (170 lines)
-a not-too-bad othello player (123 lines, does much with binary tuples)
-a neural network simulator (95 lines, uses classes, badly polluted first iteration)
-a sudoku solver (177 lines)

The compiler generally takes less than 20 seconds to analyze each program on my old athlon XP 1800+. The speedup is in general larger than a factor of 10, often much more.

In fact, I would like to invite any adventurous soul at this point to try out the compiler, and send in anything that doesn't work but should (please, as small as possible), or anything non-trivial that does work (so I can add it to the test set.)

Note that there are currently several important limitations to the type of Python code that is supported:

-Of course, don't use reflection, eval or weird run-time polymorphism that cannot be easily mapped to C++.
-Do not import libraries. Only Set is currently supported. I am working on a simple system that will allow users to add C++ versions of libraries to the compiler.
-Do not mix None and integer types. The compiler will bail out with an error. Use zero or any other special value instead. Do not mix scalar with pointer types. Try to keep things uniform. This is usually more elegant anyway. However, it is often useful to mix types in binary tuples, and the compiler supports this.
-But do not use tuples with different element types with a length greater than 2. Binary tuples with differently types elements can also not be used in addition, etc. 'Zip' only supports two arguments.
-Declaring things before use may sometimes be necessary at this point. For example, classes. I hope to fix this soon.
-Using varargs and kwargs will probably end in tragedy.
-In general, program elegantly, like you would in C++, but without all the type declarations.
-The compiler has not been tested on programs larger than 250 lines.

I still need to fix a non-termination problem I encountered, but this won't occur for most programs under 250 lines. I also need to solve many smaller issues. However, the compiler should work pretty well for many programs. If you're interested, by all means, please try it out and let me know how it went. For any larger program, there are probably a few minor issues or missing builtin functions that I would gladly fix, if I have a use case.

The compiler is currently only available via CVS. I hope to release a packaged version in about a month. Please see the README with instructions on how to use it. In any case, it won't work under Windows. Let me know if you would like to debug this.. :-)

Saturday, August 20, 2005

Countdown

Alright! After almost 8 months of more than full-time hacking the end is in sight. Everything is looking great. I'm hoping to finish the final three features of my soc-todo-list today, except for integrating a GC, which I hope to do tomorrow. The three features are: support for nested tuple assignments, e.g. a,(b,c) = expr (this may also occur inside for loops/list comprehensions), bootstrapping some useful builtins (enumerate, zip, min, max) and supporting untyped constructors that do not flow to any instance variable assignments, e.g. in return [], we may want this to become a list(str), because for example there is a return ['u'] in the same function.

The last thing I would like to do is compile a simple neural network simulator that I wrote a while ago, as the final test case before a 0.1 release. It will require writing C++ versions of some external calls (e.g. math.e, random.random).

In all it seems there are no hard problems left as far as my aims go. Of course I could easily work another few years on the problem, supporting more and more flexible code, extending many features (e.g. internally typed ternary tuples..), work on extreme scalability (by profiling/maintaining object contours) etc. etc. For now I only intend to perform several long-overdue cleanups over the following months, as well as (hopefully) adding programs written by other people to the unit test set and fixing any problems they may have (again, as far as my aims go.)

Hmm, I just realized I practically haven't had a single day off over the last 8 months.. Maybe it's time for that vacation now..

Tuesday, August 09, 2005

Getting Close

It took me some time to implement everything that was required by the Othello player. What is nice about it is that it pushes many things a bit further again, relative to the other larger test programs. No large features were required, or any major bugs uncovered, however, but I did have to fix many small problems ('the devil is in the details'). For example, constructs such as 'a = b = c', '[..for.. for..]' and '(1,2) in '[(1,2)]' were not yet well supported, and conversion of None to 0-pointers did not work well with parametric code. In the end, the resulting C++ player now plays about 25 times faster than the Python version. Pre-allocating some 'constant' lists will bring this to around 50.

As I see it, there are now only 5 relatively major features still to be implemented to reach all milestones for the SoC. These are: iterative splitting of user classes (this might work already), more aggressive merging of object contours containing (object contours with) only simple types, generating method templates (C++ function templates inside class templates), integration with a garbage collector and fixing a problem I discovered in the static preallocation code. I think these should not take me more than 10 days to implement. Other than these, there are still many minor issues of course. Maybe (hopefully) I can spend the last 10 days mopping up some minor problems that will need attention before being able to do a decent release.

Sunday, August 07, 2005

Back On the Job

Okay, after a few short days taking off (alright, of writing other Python programs.. :P) and preparing for and meeting with my university advisor, I'm back on the job. Over the last few days, I tried to learn my girl Python, by showing her, step-by-step, how to write a not-too-bad Othello player in under 100 lines. I don't think she could write something like that herself now, but I had a lot of fun :-)

I guess the first thing I will do today is to try and get this Othello player to compile. Programs like this one are perfect test-cases for the compiler: they often require some minor feature additions, which is nice and gives a feeling of progress, and generally uncover a few minor bugs, which are easily traced for such small programs. It will also be a good finger exercise to get going again.. There is still quite some work to do.. !

Saturday, July 30, 2005

C++

C++ makes for a great target language, with its intricate generics and its plethora of other low-level features. We can both succinctly map a high-level language such as Python to its feature set, and leave further optimization to be handled by decades old, mature C++ compilers. In many ways, C++ is really as flexible as Python. The problem is that this flexibility is often hard to achieve, and often results in unreadable code. It's nicer I guess to have a compiler write the C++ code for you :-) Consider the following example:

def flatsie(iter):                
   return [(b,a) for (a,b) in iter]

flatsie([(1,2.1)])         
flatsie({(2,3.141): [1,2,3]})         
flatsie({(11,4.1): None})             
flatsie(((7.7,1),))

There are some interesting things going on here. If we would like to compile this example to C++, the list comprehension has to be compiled to a generic function. But as the list comprehension iterates over both lists/tuples and dictionaries, it is not immediately clear how to write this nicely in C++, because dict is implemented using two template variables, as opposed to one for list and tuple. C++ provides an elegant solution: make iterable classes inherit from a pyiter class with a single template variable, and in case of dict, map the right template variable to it. Finally, we would also like to statically preallocate the result of the list comprehension, but since this is a generic list, we cannot define it globally. C++ again has the solution: use the 'static' keyword inside the function template, to create a parameterizable 'global' variable.

As of today, this example compiles cleanly, without breaking any of the other 95 unit tests. As there are only two types of tuples involved, the iterative analysis nicely ends up with two tuple object contours. These tuples are also modeled internally, or the list comprehensions would cause precision loss. Tuples of length greater than two are not yet internally modeled, but this would be a straight-forward generalization. In order to get here, I had to implement generic list comprehensions, and as a result the template milestones now look almost completed. In any case, it's pretty cool to see the compiler output the following working C++ code :D

template [class A, class B] static inline list[tuple2[B, A] *
     *list_comp_0(list[tuple2[B, A] *] *result,
     pyiter[tuple2[A, B] *] *iter) {
   A a;
   B b;
   tuple2[A, B] *__0;
   result->clear();

   FOR_IN(__0,iter)
       TUPLE_ASSIGN2(a,b,__0,0);
       result->append(new tuple2[B, A](b, a));
   END_FOR_IN
   return result;
}

int main() {
   flatsie(new list[tuple2[int, double] *](2, new
     tuple2[int, double](1, 2.1), new tuple2[int, double]
     (2, 4.1)));
   flatsie(new dict[tuple2[int, double] *, list[int] *]
     (1, new tuple2[tuple2[int, double] *, list[int] *]
     (new tuple2[int, double](2, 3.1),new list[int]
     (3, 1, 2, 3))));
   flatsie(new dict[tuple2[int, double] *, none *](1,
     new tuple2[tuple2[int, double] *, none *](new tuple2
     [int, double](1, 4.1),0)));
   flatsie(new tuple[tuple2[double, int] *](1, new tuple2
     [double, int](7.7, 1)));
}

template [class A, class B] list[tuple2[B, A] *] *flatsie
     (pyiter[tuple2[A, B] *] *iter) {
   static list[tuple2[B, A] *] __1;

   return list_comp_0(&__1, iter);
}

Friday, July 29, 2005

Iterative Progress

After messing around for a few days getting CVS write access from my mother's XP box (which I finally gave up) and fixing problems caused by differences in GCC versions across three operating systems, I finally got to hacking on the iterative analysis again. I really needed that, since debugging unit tests for more than a week really drained me.

First, I fixed a problem where object contours were merged that could possibly be split later on. The fix logically results in much fewer iterations on the three largest unit tests, while more contours are now created. Additionally, I haven't observed a single unit test failing afterwards, where I would get a seldom failure on test 34 or 66 previously (the analysis is not deterministic.)

Currently, only object contours that contain just monomorphic, atomic instance variables are merged, because these can not cause the merged contour to be split later on. This means that as tighter bounds are deduced during the analysis, more object contours can be merged. It seems that other object contours referencing only such contours can also be merged, because they now also cannot be split later on, i.o.w. mergability looks like a transitive property. I will have to investigate this later, as I don't want CPA to blow up in any way.

Second, I commenced work on other classes than list. Supporting singularly-typed tuples was a breeze. It was also relatively straight-forward to support multiple instance variables: we have to iterate over multiple variables per contour now, and be able to split on each of them; merging decisions have to be made over the entire instance variable state of object contours. Splitting dictionaries seems to work well now. I guess the biggest problem was how to transition everything so that all unit tests still run.. making incremental changes and running all unit tests each time worked pretty well, but took some time (having to wait 5 minutes after each small change can get annoying.)

Next up is supporting internally types tuples again. This was not possible before multiple instance variables were supported. I will have to consider these separately from regular tuples somehow. I still also have to look into internally types tuples with length greater than two, but this will probably have to wait some more.

A problem I did not give much attention before, because I was mostly concerned with lists until now, is that in general the instance variables of a class are unknown! Since the first iteration is so imprecise, classes may end up with many instance variables that will vanish again when the analysis becomes more precise. I think there is not much we can do about splitting on such instance variables, except to assume the instance variables of built-in types to be known (as I do now.) Of course, this break Python semantics.

It remains an interesting approach to perform profiling in order to get good initial object contours (i.o.w., perhaps this could keep us closer to Python semantics.) Something I hadn't thought of myself yet, but Plevyak mentioned, is that it's also possible to maintain information between different runs of the iterative analysis. If the analysis is now run repeatedly during development, the difficulty of analyzing the program can be spread out over these runs. I don't think I will get this far during the SoC, but it sounds plausible that for much larger test examples, such an approach could make using the compiler much more practical.

Sunday, July 24, 2005

No debugging for a few days!

Even though debugging is a much bigger challenge than writing crap programs, I think I prefer writing crap programs. Luckily I now have all 88 unit tests working, except for a seldom failure of test 34 or 66. Test 66 is not so big, so I should be able to locate that problem some time, and hopefully that will fix 34 too :P

Emotionally, this concludes the first milestone that my mentors and I had drafted at the start of the SoC. It seems like it took a long time, but in the mean time I am pretty far with the template generation (milestone three and four), and the second milestone seems much easier now that I have worked on the first. The last milestone, finally (integration with a GC) should not take too much effort. So I am pretty positive at this point about reaching all milestones.

The first milestone required getting 'basic list examples' to work with the iterative analysis, but in the mean time it works (most of the time) for all the unit tests (which contain some list-intensive 100+ line programs). The second milestone (considering other classes than list, and with multiple polymorphic instance variables, among other things) now seems to be a straight-forward generalization, although I may be overlooking some terrible problem of course.

I will be moving to my mother's place tomorrow for some time, and work on the second milestone there. I will bring some unboxing papers, and see if they contain something I might implement after the SoC.

Monday, July 18, 2005

Cleaning Up

When something just doesn't want to work, it's often a good solution to throw it away. Of course, Python makes this easier than many other languages. A famous quote of Tim Peters springs to mind: 'Python code is easy to throw away.' Additionally, throwing away crappy code can make you feel much better, since your program often becomes cleaner because of it, and you can always get back to the problem later..

Today I commented out the partial unboxing analysis that I had, as well the analysis that tried to work out upper bounds on strings used in reflection. Because now the data flow analysis does not have to consider the allocation sites of simple types anymore, a lot of cruft could be removed from the compiler. I will probably come back to the unboxing analysis after the SoC. The string bounds analysis was intended to help bootstrapping the compiler, but since this will not happen any time soon, I'm not sure if I'll ever revive it.

I intend to spend the following three days or so on getting as much unit tests working again as possible. After that, I will start work on the second milestone. I'm looking forward to that, as it will hopefully be more interesting than fixing bugs all day.. :-)

Sunday, July 17, 2005

Unboxing

I haven't posted here for a few days, as I was modifying my old code for unboxing of integers and floats, and I did not add any new features to the iterative analysis. I did fix a number of old unit tests to compile again with the latest code, but other than that there was nothing noteworthy.

It turns out that unboxing is not as simple as I had assumed. Actually, I never thought about it that much, because the other techniques got most of my attention - so much to do, and so little time. I decided to look into unboxing in depth, because of some undesirable interaction with the iterative analysis (e.g. having to use separate contours for lists with boxed and lists with unboxed integers.)

In the end, I concluded that creating good unboxing support is probably going to take more than just a few days, and, as I don't really have the time for this at the moment, I decided that I will disable the partial unboxing analysis that I have for now. That does not mean no unboxing can be performed: all integers and floats will be unboxed, and a warning will be given whenever they are confused with a pointer type. So for now, it will be the programmer's task not to add such confusions. Templates etc. will still work.

What will not work for now, then, are container objects with integers/floats and pointer types in the same slot, and passing integers/floats into arguments with a pointer type default (such as range(a,b=None) - we will have to provide two builtin C++ functions for this case.) As the latter can't be supported efficiently anyway without generating multiple function duplicates with different defaults, and mixing integers and pointer types for example in a list container is often not a very nice programming style, I feel the compiler could still be pretty useful.

It's always a bad feeling to have to add restrictions to the types of programs the compiler can handle, but hopefully this one will be temporary. In any case, it can still be used to compile programs written in a C++-style to C++. Although this might not sound very impressive, being able to use Python syntax in writing highly efficient programs would still be pretty cool :D

Thursday, July 14, 2005

Final Piece of the Puzzle?

Having to use 0CFA - in other words, no duplication at all of either class types or functions - during the first iteration will become terribly slow for larger programs, but worse, being so imprecise it makes life hard for IFA, causing it to have to split huge contours (consider a single contour for list for larger programs!), making it slow and requiring many iterations.

There are multiple interesting ways of combating this problem. We can selectively apply CPA during the first iterations, by limiting the sizes of cartesian products. Of course fully applying it would make the first iteration even more slow, but selectively applying it would probably not increase precision by much. I think what is really needed is to provide IFA with an initial 'guestimate' of the set of object contours that it should deduce. This does not work for just CPA, because CPA is not data-adaptive, and it would pollute itself if something 'unexpected' happens.

How to get an initial good set of object contours? Of course, heuristics can be applied. A lot of them, especially ugly ones. For example, a [1, 2] constructor could be grouped under a 'list(int)' contour. This becomes messy for more general, and polymorphic constructors - what to do with an [x, y] constructor?

I think a better, and more elegant, solution would be to perform profiling on the program that is to be compiled. Of course, if the program has unit tests, this would be a great match - but that's beside the point. It should be very simple to dump the actual types for each allocation site, possibly even relative to the actual argument types of the current function invocation. This can give us a great starting point. The dumps do not have to be perfect, because IFA will adjust to any imperfections. Best of all, we can enable CPA during all iterations, greatly improving IFA and reducing the number of iterations.

No, I don't mind that people will have to profile their applications. Especially if a simple technique like this turns out to greatly improve scalability.

BTW profiling has another advantage: it can provide type distributions at dynamic calling sites. In case of polymorphic inline caches, we can move the most used types up the ladder.

Wednesday, July 13, 2005

Bugs

Today was the first bad day I guess, since starting work on the iterative analysis. As with the generics support I worked on before that, it's not something that you can just paste on the compiler and have it work. Because both required rather intrusive modifications, I did not run the unit tests every day while working on them. The result is long debugging sessions, like today.

Spending the whole day fixing 5 or 6 stupid bugs, or 'semantical typos', gave me somewhat of a desparate feeling, partly because I still have 30 unit tests to fix (those that have broken anyway, no idea how many), and there are some larger programs among those. I think it could easily take another week to fix all tests, and make some more actual modifications to the current iterative analysis, that I would like to finish before considering the first milestone to be reached.

For example, it seems my unboxing technique is not going to work well with contours that can be merged.. I would also very much like to spend more time on the iterative analysis itself (istead of bugfixing :/), i.e. compacting contours together, better cleaning out of the allocation site table, working with lists that never flow to an assignment, and be able to merge non-singleton upper bounds together if it is clear they cannot be split later on.

Tuesday, July 12, 2005

CVS

I made several not too interesting changes today. I finally got around to fixing the [x,y] issue though. I also fixed some old unit tests again, and added some more, and spent some time hunting obscure bugs. I added some asserts which will help debug the iterative algorithm later on. Most interestingly though, I started working with CVS, and committed all my changes using this.

I had used CVS before, but I had already forgotten that it was just fun to use, and that it actually helps single-developer teams as well. Having a history can be very useful, especially for chaotic programmers like myself. Because it can be lonely though, using CVS by myself, I am going to force one of my mentors to submit at least one patch :-)

Today was sort of a grunt work day. My plan is to work on adding some new features later today or tomorrow (more aggressively combine contours; compact the contour table; rebuild the allocation site table so that it won't accumulate a mess), and then get all unit tests working, as well as add some more that I should've done before. When all this is done (I'm guestimating about a week), I consider my first milestone to have been reached, and I will start adding non-list classes to the mix, classes with multiple class variables, and be frustrated about how many new bugs this will cause :)

Monday, July 11, 2005

Controlled Explosions

Type inference is really a simple problem, with a simple solution. Just consider each allocation site separately, apply the cartesian product algorithm and you're done. You will have a simple implementation and almost perfect precision. Unfortunately, there are 'some' scaling issues of course.. Let's compare it to chess. Chess is also a simple problem, with a simple solution. Just calculate all possibilities over the next 100 moves and Kasparov will look like a beginner..

However, while there exist nice AI techniques to make simple computers play chess very well, as far as I know, the scalability problem of type inference methods has not yet been convincingly solved in the scientific literature. It might be just a matter of time, with promising techniques such as CPA and IFA. The chess problem, of course, has been researched over a much longer period of time, and by many more people, so we can be hopeful.

However, combining CPA and IFA will not be as straight-forward as I would have thought (ain't it ever.) By initially using a single contour for non-simple types, CPA doesn't blow up for these types, but because of the single contour many simple types will now become polluted when they interact with such 'overloaded' contours, again blowing up CPA. (Consider getting an int from an int list, then using it as a argument.) A solution might be to limit the size of cartesian products during the first iterations, or perhaps even to use 0CFA..

In any case, I guess my priority right now is to get everything working precisely and (hopefully) correctly. It has to _work_ first for larger programs, before I can fine-tune stuff and incorporate butt-ugly heuristics, as well as the neighbour's dog for improving scalability. I'm trying not to think about other scalability problems presented by larger programs, caused by inheritance and the proportionally increased usage of dynamic features (e.g. reflection).

In other words, I don't think I will be able to bootstrap the compiler any time soon (picture a snake with his tail in his mouth.. do snakes have tails??). I actually thought it would be possible to do during my Master's project. Then again, maybe I would have chosen something UML'ish if I had known the difficulties. IMO, it would be great, however, to actually _have_ an open source engine that is precise and works for not-too-large Python programs (say up to around a few thousand lines), that are programmed in a C++-like style.