Iterative Progress

After messing around for a few days getting CVS write access from my mother's XP box (which I finally gave up) and fixing problems caused by differences in GCC versions across three operating systems, I finally got to hacking on the iterative analysis again. I really needed that, since debugging unit tests for more than a week really drained me.

First, I fixed a problem where object contours were merged that could possibly be split later on. The fix logically results in much fewer iterations on the three largest unit tests, while more contours are now created. Additionally, I haven't observed a single unit test failing afterwards, where I would get a seldom failure on test 34 or 66 previously (the analysis is not deterministic.)

Currently, only object contours that contain just monomorphic, atomic instance variables are merged, because these can not cause the merged contour to be split later on. This means that as tighter bounds are deduced during the analysis, more object contours can be merged. It seems that other object contours referencing only such contours can also be merged, because they now also cannot be split later on, i.o.w. mergability looks like a transitive property. I will have to investigate this later, as I don't want CPA to blow up in any way.

Second, I commenced work on other classes than list. Supporting singularly-typed tuples was a breeze. It was also relatively straight-forward to support multiple instance variables: we have to iterate over multiple variables per contour now, and be able to split on each of them; merging decisions have to be made over the entire instance variable state of object contours. Splitting dictionaries seems to work well now. I guess the biggest problem was how to transition everything so that all unit tests still run.. making incremental changes and running all unit tests each time worked pretty well, but took some time (having to wait 5 minutes after each small change can get annoying.)

Next up is supporting internally types tuples again. This was not possible before multiple instance variables were supported. I will have to consider these separately from regular tuples somehow. I still also have to look into internally types tuples with length greater than two, but this will probably have to wait some more.

A problem I did not give much attention before, because I was mostly concerned with lists until now, is that in general the instance variables of a class are unknown! Since the first iteration is so imprecise, classes may end up with many instance variables that will vanish again when the analysis becomes more precise. I think there is not much we can do about splitting on such instance variables, except to assume the instance variables of built-in types to be known (as I do now.) Of course, this break Python semantics.

It remains an interesting approach to perform profiling in order to get good initial object contours (i.o.w., perhaps this could keep us closer to Python semantics.) Something I hadn't thought of myself yet, but Plevyak mentioned, is that it's also possible to maintain information between different runs of the iterative analysis. If the analysis is now run repeatedly during development, the difficulty of analyzing the program can be spread out over these runs. I don't think I will get this far during the SoC, but it sounds plausible that for much larger test examples, such an approach could make using the compiler much more practical.

Shed Skin - A (restricted) Python-to-C++ Compiler

Friday, July 29, 2005

Iterative Progress

No comments: