Shed Skin - A (restricted) Python-to-C++ Compiler: July 2005

Saturday, July 30, 2005

C++

C++ makes for a great target language, with its intricate generics and its plethora of other low-level features. We can both succinctly map a high-level language such as Python to its feature set, and leave further optimization to be handled by decades old, mature C++ compilers. In many ways, C++ is really as flexible as Python. The problem is that this flexibility is often hard to achieve, and often results in unreadable code. It's nicer I guess to have a compiler write the C++ code for you :-) Consider the following example:

def flatsie(iter):                
   return [(b,a) for (a,b) in iter]

flatsie([(1,2.1)])         
flatsie({(2,3.141): [1,2,3]})         
flatsie({(11,4.1): None})             
flatsie(((7.7,1),))

There are some interesting things going on here. If we would like to compile this example to C++, the list comprehension has to be compiled to a generic function. But as the list comprehension iterates over both lists/tuples and dictionaries, it is not immediately clear how to write this nicely in C++, because dict is implemented using two template variables, as opposed to one for list and tuple. C++ provides an elegant solution: make iterable classes inherit from a pyiter class with a single template variable, and in case of dict, map the right template variable to it. Finally, we would also like to statically preallocate the result of the list comprehension, but since this is a generic list, we cannot define it globally. C++ again has the solution: use the 'static' keyword inside the function template, to create a parameterizable 'global' variable.

As of today, this example compiles cleanly, without breaking any of the other 95 unit tests. As there are only two types of tuples involved, the iterative analysis nicely ends up with two tuple object contours. These tuples are also modeled internally, or the list comprehensions would cause precision loss. Tuples of length greater than two are not yet internally modeled, but this would be a straight-forward generalization. In order to get here, I had to implement generic list comprehensions, and as a result the template milestones now look almost completed. In any case, it's pretty cool to see the compiler output the following working C++ code :D

template [class A, class B] static inline list[tuple2[B, A] *
     *list_comp_0(list[tuple2[B, A] *] *result,
     pyiter[tuple2[A, B] *] *iter) {
   A a;
   B b;
   tuple2[A, B] *__0;
   result->clear();

   FOR_IN(__0,iter)
       TUPLE_ASSIGN2(a,b,__0,0);
       result->append(new tuple2[B, A](b, a));
   END_FOR_IN
   return result;
}

int main() {
   flatsie(new list[tuple2[int, double] *](2, new
     tuple2[int, double](1, 2.1), new tuple2[int, double]
     (2, 4.1)));
   flatsie(new dict[tuple2[int, double] *, list[int] *]
     (1, new tuple2[tuple2[int, double] *, list[int] *]
     (new tuple2[int, double](2, 3.1),new list[int]
     (3, 1, 2, 3))));
   flatsie(new dict[tuple2[int, double] *, none *](1,
     new tuple2[tuple2[int, double] *, none *](new tuple2
     [int, double](1, 4.1),0)));
   flatsie(new tuple[tuple2[double, int] *](1, new tuple2
     [double, int](7.7, 1)));
}

template [class A, class B] list[tuple2[B, A] *] *flatsie
     (pyiter[tuple2[A, B] *] *iter) {
   static list[tuple2[B, A] *] __1;

   return list_comp_0(&__1, iter);
}

Friday, July 29, 2005

Iterative Progress

After messing around for a few days getting CVS write access from my mother's XP box (which I finally gave up) and fixing problems caused by differences in GCC versions across three operating systems, I finally got to hacking on the iterative analysis again. I really needed that, since debugging unit tests for more than a week really drained me.

First, I fixed a problem where object contours were merged that could possibly be split later on. The fix logically results in much fewer iterations on the three largest unit tests, while more contours are now created. Additionally, I haven't observed a single unit test failing afterwards, where I would get a seldom failure on test 34 or 66 previously (the analysis is not deterministic.)

Currently, only object contours that contain just monomorphic, atomic instance variables are merged, because these can not cause the merged contour to be split later on. This means that as tighter bounds are deduced during the analysis, more object contours can be merged. It seems that other object contours referencing only such contours can also be merged, because they now also cannot be split later on, i.o.w. mergability looks like a transitive property. I will have to investigate this later, as I don't want CPA to blow up in any way.

Second, I commenced work on other classes than list. Supporting singularly-typed tuples was a breeze. It was also relatively straight-forward to support multiple instance variables: we have to iterate over multiple variables per contour now, and be able to split on each of them; merging decisions have to be made over the entire instance variable state of object contours. Splitting dictionaries seems to work well now. I guess the biggest problem was how to transition everything so that all unit tests still run.. making incremental changes and running all unit tests each time worked pretty well, but took some time (having to wait 5 minutes after each small change can get annoying.)

Next up is supporting internally types tuples again. This was not possible before multiple instance variables were supported. I will have to consider these separately from regular tuples somehow. I still also have to look into internally types tuples with length greater than two, but this will probably have to wait some more.

A problem I did not give much attention before, because I was mostly concerned with lists until now, is that in general the instance variables of a class are unknown! Since the first iteration is so imprecise, classes may end up with many instance variables that will vanish again when the analysis becomes more precise. I think there is not much we can do about splitting on such instance variables, except to assume the instance variables of built-in types to be known (as I do now.) Of course, this break Python semantics.

It remains an interesting approach to perform profiling in order to get good initial object contours (i.o.w., perhaps this could keep us closer to Python semantics.) Something I hadn't thought of myself yet, but Plevyak mentioned, is that it's also possible to maintain information between different runs of the iterative analysis. If the analysis is now run repeatedly during development, the difficulty of analyzing the program can be spread out over these runs. I don't think I will get this far during the SoC, but it sounds plausible that for much larger test examples, such an approach could make using the compiler much more practical.

Sunday, July 24, 2005

No debugging for a few days!

Even though debugging is a much bigger challenge than writing crap programs, I think I prefer writing crap programs. Luckily I now have all 88 unit tests working, except for a seldom failure of test 34 or 66. Test 66 is not so big, so I should be able to locate that problem some time, and hopefully that will fix 34 too :P

Emotionally, this concludes the first milestone that my mentors and I had drafted at the start of the SoC. It seems like it took a long time, but in the mean time I am pretty far with the template generation (milestone three and four), and the second milestone seems much easier now that I have worked on the first. The last milestone, finally (integration with a GC) should not take too much effort. So I am pretty positive at this point about reaching all milestones.

The first milestone required getting 'basic list examples' to work with the iterative analysis, but in the mean time it works (most of the time) for all the unit tests (which contain some list-intensive 100+ line programs). The second milestone (considering other classes than list, and with multiple polymorphic instance variables, among other things) now seems to be a straight-forward generalization, although I may be overlooking some terrible problem of course.

I will be moving to my mother's place tomorrow for some time, and work on the second milestone there. I will bring some unboxing papers, and see if they contain something I might implement after the SoC.

Monday, July 18, 2005

Cleaning Up

When something just doesn't want to work, it's often a good solution to throw it away. Of course, Python makes this easier than many other languages. A famous quote of Tim Peters springs to mind: 'Python code is easy to throw away.' Additionally, throwing away crappy code can make you feel much better, since your program often becomes cleaner because of it, and you can always get back to the problem later..

Today I commented out the partial unboxing analysis that I had, as well the analysis that tried to work out upper bounds on strings used in reflection. Because now the data flow analysis does not have to consider the allocation sites of simple types anymore, a lot of cruft could be removed from the compiler. I will probably come back to the unboxing analysis after the SoC. The string bounds analysis was intended to help bootstrapping the compiler, but since this will not happen any time soon, I'm not sure if I'll ever revive it.

I intend to spend the following three days or so on getting as much unit tests working again as possible. After that, I will start work on the second milestone. I'm looking forward to that, as it will hopefully be more interesting than fixing bugs all day.. :-)

Sunday, July 17, 2005

Unboxing

I haven't posted here for a few days, as I was modifying my old code for unboxing of integers and floats, and I did not add any new features to the iterative analysis. I did fix a number of old unit tests to compile again with the latest code, but other than that there was nothing noteworthy.

It turns out that unboxing is not as simple as I had assumed. Actually, I never thought about it that much, because the other techniques got most of my attention - so much to do, and so little time. I decided to look into unboxing in depth, because of some undesirable interaction with the iterative analysis (e.g. having to use separate contours for lists with boxed and lists with unboxed integers.)

In the end, I concluded that creating good unboxing support is probably going to take more than just a few days, and, as I don't really have the time for this at the moment, I decided that I will disable the partial unboxing analysis that I have for now. That does not mean no unboxing can be performed: all integers and floats will be unboxed, and a warning will be given whenever they are confused with a pointer type. So for now, it will be the programmer's task not to add such confusions. Templates etc. will still work.

What will not work for now, then, are container objects with integers/floats and pointer types in the same slot, and passing integers/floats into arguments with a pointer type default (such as range(a,b=None) - we will have to provide two builtin C++ functions for this case.) As the latter can't be supported efficiently anyway without generating multiple function duplicates with different defaults, and mixing integers and pointer types for example in a list container is often not a very nice programming style, I feel the compiler could still be pretty useful.

It's always a bad feeling to have to add restrictions to the types of programs the compiler can handle, but hopefully this one will be temporary. In any case, it can still be used to compile programs written in a C++-style to C++. Although this might not sound very impressive, being able to use Python syntax in writing highly efficient programs would still be pretty cool :D

Thursday, July 14, 2005

Final Piece of the Puzzle?

Having to use 0CFA - in other words, no duplication at all of either class types or functions - during the first iteration will become terribly slow for larger programs, but worse, being so imprecise it makes life hard for IFA, causing it to have to split huge contours (consider a single contour for list for larger programs!), making it slow and requiring many iterations.

There are multiple interesting ways of combating this problem. We can selectively apply CPA during the first iterations, by limiting the sizes of cartesian products. Of course fully applying it would make the first iteration even more slow, but selectively applying it would probably not increase precision by much. I think what is really needed is to provide IFA with an initial 'guestimate' of the set of object contours that it should deduce. This does not work for just CPA, because CPA is not data-adaptive, and it would pollute itself if something 'unexpected' happens.

How to get an initial good set of object contours? Of course, heuristics can be applied. A lot of them, especially ugly ones. For example, a [1, 2] constructor could be grouped under a 'list(int)' contour. This becomes messy for more general, and polymorphic constructors - what to do with an [x, y] constructor?

I think a better, and more elegant, solution would be to perform profiling on the program that is to be compiled. Of course, if the program has unit tests, this would be a great match - but that's beside the point. It should be very simple to dump the actual types for each allocation site, possibly even relative to the actual argument types of the current function invocation. This can give us a great starting point. The dumps do not have to be perfect, because IFA will adjust to any imperfections. Best of all, we can enable CPA during all iterations, greatly improving IFA and reducing the number of iterations.

No, I don't mind that people will have to profile their applications. Especially if a simple technique like this turns out to greatly improve scalability.

BTW profiling has another advantage: it can provide type distributions at dynamic calling sites. In case of polymorphic inline caches, we can move the most used types up the ladder.

Wednesday, July 13, 2005

Bugs

Today was the first bad day I guess, since starting work on the iterative analysis. As with the generics support I worked on before that, it's not something that you can just paste on the compiler and have it work. Because both required rather intrusive modifications, I did not run the unit tests every day while working on them. The result is long debugging sessions, like today.

Spending the whole day fixing 5 or 6 stupid bugs, or 'semantical typos', gave me somewhat of a desparate feeling, partly because I still have 30 unit tests to fix (those that have broken anyway, no idea how many), and there are some larger programs among those. I think it could easily take another week to fix all tests, and make some more actual modifications to the current iterative analysis, that I would like to finish before considering the first milestone to be reached.

For example, it seems my unboxing technique is not going to work well with contours that can be merged.. I would also very much like to spend more time on the iterative analysis itself (istead of bugfixing :/), i.e. compacting contours together, better cleaning out of the allocation site table, working with lists that never flow to an assignment, and be able to merge non-singleton upper bounds together if it is clear they cannot be split later on.

Tuesday, July 12, 2005

CVS

I made several not too interesting changes today. I finally got around to fixing the [x,y] issue though. I also fixed some old unit tests again, and added some more, and spent some time hunting obscure bugs. I added some asserts which will help debug the iterative algorithm later on. Most interestingly though, I started working with CVS, and committed all my changes using this.

I had used CVS before, but I had already forgotten that it was just fun to use, and that it actually helps single-developer teams as well. Having a history can be very useful, especially for chaotic programmers like myself. Because it can be lonely though, using CVS by myself, I am going to force one of my mentors to submit at least one patch :-)

Today was sort of a grunt work day. My plan is to work on adding some new features later today or tomorrow (more aggressively combine contours; compact the contour table; rebuild the allocation site table so that it won't accumulate a mess), and then get all unit tests working, as well as add some more that I should've done before. When all this is done (I'm guestimating about a week), I consider my first milestone to have been reached, and I will start adding non-list classes to the mix, classes with multiple class variables, and be frustrated about how many new bugs this will cause :)

Monday, July 11, 2005

Controlled Explosions

Type inference is really a simple problem, with a simple solution. Just consider each allocation site separately, apply the cartesian product algorithm and you're done. You will have a simple implementation and almost perfect precision. Unfortunately, there are 'some' scaling issues of course.. Let's compare it to chess. Chess is also a simple problem, with a simple solution. Just calculate all possibilities over the next 100 moves and Kasparov will look like a beginner..

However, while there exist nice AI techniques to make simple computers play chess very well, as far as I know, the scalability problem of type inference methods has not yet been convincingly solved in the scientific literature. It might be just a matter of time, with promising techniques such as CPA and IFA. The chess problem, of course, has been researched over a much longer period of time, and by many more people, so we can be hopeful.

However, combining CPA and IFA will not be as straight-forward as I would have thought (ain't it ever.) By initially using a single contour for non-simple types, CPA doesn't blow up for these types, but because of the single contour many simple types will now become polluted when they interact with such 'overloaded' contours, again blowing up CPA. (Consider getting an int from an int list, then using it as a argument.) A solution might be to limit the size of cartesian products during the first iterations, or perhaps even to use 0CFA..

In any case, I guess my priority right now is to get everything working precisely and (hopefully) correctly. It has to _work_ first for larger programs, before I can fine-tune stuff and incorporate butt-ugly heuristics, as well as the neighbour's dog for improving scalability. I'm trying not to think about other scalability problems presented by larger programs, caused by inheritance and the proportionally increased usage of dynamic features (e.g. reflection).

In other words, I don't think I will be able to bootstrap the compiler any time soon (picture a snake with his tail in his mouth.. do snakes have tails??). I actually thought it would be possible to do during my Master's project. Then again, maybe I would have chosen something UML'ish if I had known the difficulties. IMO, it would be great, however, to actually _have_ an open source engine that is precise and works for not-too-large Python programs (say up to around a few thousand lines), that are programmed in a C++-like style.

Sunday, July 10, 2005

Small features, bugfixes

Today I worked all over the map. Maybe the most interesting aspects are that nested types and truly polymorphic containers don't crash the compiler anymore.. :-) They actually seem to work. The following example compiles and runs fine for example:

def duplll(x):             # x: [list(A)]
    return [x]             # [list(list(A))]

a = duplll([1])            # [list(list(int))]
b = duplll([1.0])          # [list(list(float))]

def ident(x):              # x: [list(list(A))]r
    return x               # [list(list(A))]
def meuk(x):               # x: [list(list(A))]
    return ident(x)        # [list(list(A))]

c = meuk(a)                # [list(list(int))]
d = meuk(b)                # [list(list(float))]

def makel(x):              # x: [list(A)]
    return [x]             # [list(list(A))]

def dupl(x):               # x: [list(list(A))]
    return [makel(x[0])]   # [list(list(list(A)))]

dupl([[1]])                # [list(list(list(int)))]

y = [[('1',)]]             # [list(list(tuple(str)))]
dupl(y)                    # [list(list(list(tuple(str))))]

d = [[1]]                  # [list(list(int))]
d = [[1.0]]                # [list(list(float))]
d                          # [list(list(pyobj))]

It contains a polymorphic call chain ('meuk' and 'ident'), nested (in the first iteration even recursive) types and a truly polymorphic container in the end. It's nice to see all the C++ crap that is generated just for the last assignment, especially considering that in Python often much more involved datastructures are used than a list of lists of floats (I'll show some nice examples later on :-))

d = ((list[list[pyobj *] *] *)(new list[list[double] *>(1,
   new list[double](1, new double(1.0)))));

The cast is necessary because C++ won't (rightly) accept the assignment otherwise, and the float is allocated on the heap, because that's how I currently represent 'confused' objects (it can't be unboxed, because floats and integers are confused inside the container.) I believe it's a nice example of how static type checking gets in the way of programming. I dare not show the entire generated C++ program :-)

Other than the above, I fixed many small bugs, rediscovered my affinity for unit testing, made some old unit tests work again and improved the splitting at confluence points (by actually looking at which creation sites come in via the data flow edges - for example, the two edges may provide the same sets, in which case splitting is not very useful.)

Still, there is so much more to do it makes me feel a bit discouraged at the moment. I'll just have to take it feature by feature and bug by bug at a time.. sometimes I feel like the guy in 'Touching the Void'. Watch the movie and you'll know what I mean :-)

Saturday, July 09, 2005

Integration Blues

Okay, so it took me most of the day to be able to support '[x]'. And it still doesn't work perfectly, for example '[x,y]' will require more work. At least I know where the problems lie now, and '[x]' suffices for debugging the iterative analysis. This is because using '[]' several times makes these allocation sites looks alike, obfuscating debug output. The problem was that I did not anticipate object contours to change during analysis, so in the end I just had to make it a bit more dynamic..

With debugging now much easier, it actually became fun to debug some more problems I found in the iterative analysis. To make debugging even more fun, I picked up my habit again of adding unit tests for every new piece of code that I wrote, especially those causing the tiniest of problems. At the moment, no known bugs remain, just.. let's call them 'missing features'.

I guess tomorrow I will start messing around with finding some corner cases, and see how the algorithm works for truly polymorphic containers (when an instance variable of a container object actually takes on different types during run-time). Of course throwing ints and floats into a single list doesn't have to be very efficiently supported (there are probably cases where it's useful, but I haven't seen them). However, in case of inheritance hierarchies, it would be nice if lists with objects of classes from the same hierarchy don't blow up the analysis.. :)

Friday, July 08, 2005

Scaling up

I spent most of today debugging silly problems, that started to appear as I was feeding the compiler with larger examples. In a codebase of about 5000 lines simple things can become hard to find. It looks like I found the last of them, as I can't find a smallish 'list' example that doesn't work anymore. For example, an extended version of the example in my previous posting, now with 15 allocation sites and several functions (ident, dupl, makel) nicely converges to these 2 object contours:

1: ImmutableSet([])
2: ImmutableSet([class int_])
3: ImmutableSet([class float_])
4: ImmutableSet([])
5: ImmutableSet([])
unused [4, 5, 1]

As can be seen, in total 5 intermediate object contours are created, as the compiler splits and merges contours during each iteration. Although some steps of the iterative analysis as I've implemented it are slow (for example, by checking whether a tuple of tuples is in some Set), the theoretical complexity of the whole is greatly reduced, as CPA will create a boatload less templates when there are fewer object contours!

My next step will be to clean this stuff up some more, e.g. remove unused contours after each iteration, fix the fact that '[x]' is not supported, and start looking at other classes than list. At some point I will also have to start looking at nested and recursive types, and start thinking about termination issues.. shudder!!

Thursday, July 07, 2005

John Plevyak

Since John Plevyak wrote his PhD thesis almost ten years ago, I decided to ask him how he views the state of the solutions for handling data polymorphism today. Unexpectedly, I received a detailed answer within minutes. It appears he is still (or again) working on his iterative method, and is actually planning to release an open source inference engine at some point in the future. How cool! :D

Administrative Issues

I spent the day working on various administrative issues. Since allocation sites inside function templates are 'thrown' away after each CPA iteration, we need to record previously deduced type information for them, which have to be inserted again if similar templates are created in the next iteration. This becomes quite some administration, because these allocation may be located in templates created for argument types whose contours may be split or merged in between iterations.

First, each allocation site is given an (AST node, argument product) identifier. Based on this, a (currently global) table is maintained with currently deduced type information for each allocation site, and updated as contours change. The following more complicated example code is now analyzed correctly. The allocation sites inside 'dupl' templates are dependent on the 'y' argument, whose contour at some point is split. It goes too far to explain all this in detail, so here is the code:

def ident(x):                            # x: [list(A)]r
    return x                             # [list(A)]
a = []                                   # [list(int)]
a = []                                   # [list(int)]
a = []                                   # [list(int)]
b = []                                   # [list(float)]

ident(a).append(1)                       # []
ident(b).append(1.0)                     # []

def dupl(y):                             # y: [list(A)]
    k = []                               # [list(float)]
    k.append(1.0)                        # []
    l = []                               # [list(A)]
    l.append(y[0])                       # []
    return l                             # [list(A)]

dupl(a)                                  # [list(int)]
dupl(b)                                  # [list(float)]

All lists in the example contain either integers or floats, so we would expect the analysis to end up with two list contours, one for each type. However, the compiler does not merge contours yet that end up with the same contents. Clearly, it can happen that when a contour becomes more precise because of CPA, there already is another contour available with the same precision. Of course then, these contours should be merged, because sharing of contours is what this whole iterative analysis is about..

In the previous situation, six object contours would have been created (one for each syntactic allocation site), potentially blowing up CPA (as it works with the cross-product of actual arguments). Additionally, the types for 'dupl(a)' and 'dupl(b)' would have been imprecise, because a single contour would be used for the different incarnations of 'l' in the different templates for 'dupl'.. It's already nice to see both improved precision and efficiency, although I still have much to do, and probably especially to debug.. :P

Wednesday, July 06, 2005

This just in: first results

After building some further infrastructure for the iterative analysis today, and sanitizing other parts of the compiler to cooperate with it, I have the first simple example compiling ('return [x]' is not supported for now :-)):

def makel(x)
   l = []
   l.append(x)
   return l
print makel(1.0)
print makel(1)

During the first iteration, CPA creates two method templates for 'makel', because it sees that it is called with exactly two different argument products: (float,) and (int,). Since we want to iteratively improve upon imprecise object contours, during this first CPA iteration a single object contour is used for the duplicated allocation sites '[]' inside 'makel' (one for each template.) Consequently, an upper bound of {int, float} is inferred for this object contour.

As I explained earlier, after each iteration, the iterative algorithm tries to resolve such imprecisions by splitting object contours. But splitting is not yet necessary for this example to work. By tracing back dataflow paths from inside the two 'append' templates (conflicting assignments to an instance variable), we can conclude that each '[]' allocation site flows only to one of these. The compiler does just that, and splits the object contour in two. Further, it records the newly gained information about these allocation sites in a global table (still messy atm), so that future CPA iterations can seed these allocation sites with the right object contours as they create their templates.

After one more iteration, the compiler concludes that nothing more can change, and generates the following code (nicely parameterizing the types as well, corresponding to another part of my SoC proposal - btw I renamed template brackets to [ and ] because messing with HTML apparently makes me very sad):

int main() {
   printf("%s\n", CS(makel(1.0)));
   printf("%s\n", CS(makel(1)));
}

template [class A] list[A] *makel(A x) {
   list[A] *l;
   l = new list[A]();
   l->append(x);
   return l;
}

Which compiles and runs just fine. Simple object contour splitting also works already (see two postings earlier):

def ident(x):
   return x
a = []
a = []
b = []
ident(a).append(1)
ident(b).append(1.0)

After the first iteration, the compiler notes the conflicting assignments, and sees that multiple creation points 'flow' to both assignments. It tries to 'resolve' this problem by finding a confluence node (in our case, 'x'), and by splitting the respective object contour along the incoming edges of this node. Here, it nicely splits the 'mother' contour into a contour for the two creation points for 'a' and a contour for the creation point for 'b'. In the next iteration, CPA splits the 'ident' function, because a and b now have different object contours, and the imprecision is gone :-)

Appointments

After a week of becoming increasingly frustrated by all sorts of daily appointments, I am finally free to work on my compiler for the rest of july and august! :D I hope to divert all potential appointments to some later date, except for my motor cycle lessons, which I just didn't want to quit.. This tax thing looks like it will be needing some attention too.. non-US SoC'ers need to find like 20 different forms, fill them in correctly, and send to different places, in order to avoid losing 30% of the total grant. Problem is, my brain immediately goes in stand-by mode each time I start to think about it..

In all, I guess there should be enough time to build something nice to handle many types of data polymorphism. Two months is a lot. I am a bit overwhelmed though, by the complexity of both the problem and Plevyak's solution. It's not going to be easy..

BTW I made my first release. It's highly unstable, but it's there I guess ;) I will stabilize it later, when datapolymorphism is handled better (at all?). It can be downloaded via the Sourceforge link. How to use it? Edit test.py and add some Python code, or use the hello world example it contains; run python ss.py - this will create two C++ files, test.?pp as well as a Makefile - then make and either make run or ./test. The file test.py (and imported module files) will be annotated with the deduced types. (Not always correctly though, as the annotation code is not perfect.) All the code in unit.py has run well at some point (from the same codebase), but at this moment I guess many of the tests will fail. Okay, back to work..

Friday, July 01, 2005

The Problem of Data Polymorphism

Parametric polymorphism refers to the ability of functions to be called with different types of arguments in different contexts. The cartesian product algorithm, or CPA, efficiently separates these contexts from each other, by creating templates for each function, based on the cartesian product of the actual arguments provided at each call site. By connecting each context with certain templates during analysis, these templates do not pollute each other with bogus types. The idea behind CPA is that when different call sites provide similar argument types, these templates can be shared. This makes for a very efficient analysis, and also a very precise one, as it is not limited to a certain call-chain length, as earlier analyses such as NCFA.

However, CPA only considers 'simple' types, such as integers and floats. Because integers and floats always have the same 'behaviour', these types may be merged together during dataflow analysis (i.e. when one integer meets another integer). CPA templates can be created eagerly for these types, as everything is known about them at each call site where they pop up - all integers are the same, so a template made for one integer will be valid and precise for all integers that pass by. For polymorphic containers such as 'list' the story becomes more complicated. Data polymorphism refers to the ability of instance variables to contain different types. Different such types may require different templates to be made. For example, the return type of some function may depend on an instance variable of an argument to that function. Unfortunately, instance variable types cannot be fully known at their allocation site! Consider the following simple example:

def ident(x):
   return x
a = []
a = []
b = []
ident(a).append(1)
ident(b).append(1.0)

Do we create one or two templates for the 'ident' function? As the list contents may not be known as these lists reach the function calls, we cannot eagerly create multiple templates. We cannot merge lists either during dataflow analysis, and then split them later, when we find out more about their contents, as this could leave the dataflow network in an inconsistent state.

A naive solution, called 1CFA, is to create separate 'class templates' for each allocation site, in effect giving each allocation site its own type. This slows down the process, and is still not precise, because allocation sites may be duplicated as part of templates, again giving rise to possibly differently used polymorphic containers. Splitting these duplicated allocation sites would quickly make for an intractable computation.

A better solution adapts itself to the actual data polymorphism used in the program (the way CPA adapts itself to the actual parametric polymorphism), so that as many templates as possible can be shared. One solution, already alluded to by Olin Shivers (of NCFA fame), is to iterate the analysis, starting from a very imprecise situation, and trying to improve type information after each iteration. John Plevyak found a practical method to implement this idea. Its first iteration is just 0CFA, which means that all similar types are merged, e.g. all lists. By looking at the dataflow graph in a backward fashion, starting at conflicting assignments to a single instance variable, it is possible to locate so-called 'imprecision points'. In the above example, if we start inside the two append templates (CPA creates one for integer and one for float cases), and we look backwards, we find that there flow types from different allocation sites into the formal argument of 'ident'. By splitting the list type in two, one for each incoming dataflow edge at this imprecision point, the respective imprecision can be 'removed', because CPA will now create different templates for 'ident' during the next iteration! By repeatedly splitting types, based on imprecision points, Plevyak's analysis 'adapts' itself to the data polymorphism in the program. Heuristics are probably needed for splitting decisions, because splitting too little means many iterations, and splitting too much means CPA might blow up.

Actually, when Plevyak developed the above method, CPA did not yet exist. His approach also encompassed means of handling parametric polymorphism precisely, but reportedly it was not very efficient. As the developer of CPA alluded to in his PhD thesis, it would be nice to try and combine Plevyak's data-iterative approach with CPA. Over the following weeks, I will try to implement such a combination, although I am a bit wary about potential problems I might encounter. As the first iteration will be just 0CFA, there will be almost no precision - how to select imprecision points that, when split, will actually help? What if assignment types are not singleton sets, or what if there are many different combinations of assignment types? What about inheritance? How do we maintain information about duplicated allocation sites between iterations? How do we terminate efficiently in the face of many polymorphic containers? I guess I will have to find out..