Saturday, October 16, 2010

Shed Skin 0.6

I have just released version 0.6 of Shedskin, an experimental (restricted-)Python-to-C++ compiler.

Most importantly, this release finally comes with a substantial scalability improvement. Instead of analyzing a whole program at once, Shedskin now repeatedly analyzes an ever increasing version of the program, starting at nothing, essentially, until the whole program is analyzed. It turns out it this is much easier than analyzing programs as a whole. The result is that Shedskin now easily scales to several test programs of over 1,000 lines (sloccount), and needs much less memory to analyze them.

I'd really like to receive programs that Shedskin 0.6 is unabe to analyze within a reasonable amount of time, to get an idea of the actual extent of the improvement, and to be motivated to further improve things. It's hard to be motivated to improve things if all programs you know of work well!

Interestingly, because of the incremental analysis, it also became possible to add a progress bar to Shedskin. It will slow down as we go to 100 percent, and be somewhat inaccurate if parts of a program turn out to be unused, but I for one couldn't be happier with it.

Another area that has seen substantial improvements is the support for generating extension modules. For example, basic (un)pickling of compiled classes should now work, and inheritance and overloading should now be better exposed to the outside world.

The most important bugfix is probably to correct the module/linenumber information for warnings and errors. It turns out the '#{' feature messed up line numbers, and with code spanning several files, Shedskin would often display the wrong module as well. This should all be fixed now.

More details about the release can be found in the release notes.

I would like to thank the following people for contributing to this release: Hakan Ardo, Thomas Spura, Eric Uhrhane, Saravanan Shanmugham, HartsAntler and Douglas McNeil, as well as Plonjers, for sending in the nice test case that triggered the scalability improvement.

update: Just after the release, Danny Milosavljevic sent in a great 50th example program, a Commodore 64 emulator! The emulator itself is a work in progress, but it boots and you can start typing commands. After modifying it a bit so it compiles out-of-the-box with 0.6, I added it to the 0.6 example tarball.

41 comments:

Witek Baryluk said...

Cool stuff. Keep going.

srepmub said...

thanks! 0.7 is coming right up, with lots of minor fixes and some performance improvements. also a nice new 1,050 sloc example, a multiprocessing raytracer:

http://groups.google.com/group/shedskin-discuss/browse_thread/thread/5484bc5e3aea5e44

srepmub said...

proper link

Enzo said...

Very nice your program... I am really interested in it, and I have many Python code to port to C++...
What do you recommend?

srepmub said...

thanks! it's probably best to start by looking through the tutorial..

Enzo said...

But, my system is Windows. I think I will get some problems to put it to work here. Do you have some plan to support Windows?

srepmub said...

it probably works under windows with MingW (I supported this until a few releases ago), but nobody wants to test and package it at this point, so.. why not try ubuntu linux? :) http://ubuntu.com

Enzo said...

Thanks,

but I dont have any linux distribution, and all my works are done to work on windows, so, I have nothing to do with linux, sorry...

I am going to try make it run in my windows.

Enzo said...

Also,

If you give me necessary info, I could take support about windows things... because I have really interest about this to run on Windows... I have many codes to port to c++, mainly, they are all related to string matching, what is really slow in python, by the way, in python it is easy to deal with strings... but, performance is really important as the codes needs to deal with thousands or even billions of strings...

srepmub said...

the last version that was supported is actually still from this year, so you could possibly just download that and then insert the 0.6 linux or latest GIT version in it (for example, as c:\shedskin-0.6\shedskin).

in the discussion group you may also find some people that use it under windows..

if you'd like to package the upcoming 0.7 for windows, please start a thread for this in the discussion group.

but please do note string-intensive code is actually quite efficient in CPython (because most time is spent in hand-optimized C code), so compilation may not help much, if at all.

Enzo said...

Well,

I am aware, but having the code in C++ give me the possibility to use many C++ tools, that indeed will help me to get more performance... one of them, is the AMD C++ profiler, it will help me to find bottlenecks in a very easy way. There are also many other tools, but the list is too long to be mentioned.

srepmub said...

I always use gprof2dot.py for profiling python and C++ programs, and can highly recommend it. um, but I'm not sure that works under windows..

Enzo said...

Thanks for the reply, but I think AMD CodeAnalyst fit for my purpose very well, just check it, it is an amazing tool, you can learn very much how to optimize c++ codes... for scalar, vectorial, multi-threaded, and things like this...

http://developer.amd.com/cpu/codeanalyst/Pages/default.aspx

srepmub said...

ah, and it works under linux.. ^^ thanks for the link!

in the meantime, I've decided to package 0.7 for windows again myself.. it should be out in about a week.

Enzo said...

"ah, and it works under linux.. ^^ thanks for the link!"

You are welcome, I think you will be amazed with CodeAnalyst... It have a lot of features... and indeed can help you to make the "perfect code" both in quality and performance.. :-)

"in the meantime, I've decided to package 0.7 for windows again myself.. it should be out in about a week."

Thank you! I will need it for a long time... and it is supposed that your software will help me a lot....

I promise, if I get profit with my softwares using your tool, I will donate a good amount to you keep the development of your software...

Thank you again.

Greg Copeland said...

Excellent work!

I saw a twitter post about a forth VM written in a variety of languages (including c, ruby, lua, and python) and thought optimizing the python implementation would be fun while maintaining readability. It was. As a final step, I tried Shed Skin. Wow!

Here are the original benchmarks. Please note, the blog is, as I understand it, CGI via a forth VM implemented in python, running on cpython. http://rx-core.org/dev/corpse/article/36

Here are the latest benchmarks, which include cpython, pypy, and of course shed skin. http://j.mp/igcKxp

As a side note, I was somewhat surprised that a chain of if/elif wasn't optimized into a switch statement. Regardless, the performance is really awesome for shedskin.

Please keep up the good work. Look forward to seeing your improvements down the road.

As a side note, struct module support would be awesome. Lots of networking and interoperability code depends on it.

srepmub said...

@enzo: I just had a go at packaging shedskin 0.7 for windows, using the latest version of mingw (which includes GCC 4.5! :D), and after a few minor tweaks everything still seems to work. I will test things a bit better, but it now looks like it should be possible to release this within a few days.. :-)

Enzo said...

@Greg:
I also wish it had struct module support :-)
But I think I cant demands too much if I am not paying for this...

@srepmub
Thank you very much for the effort!

Enzo said...

@Greg

Hi greg, I saw that you got contact with the guy of rx-core.org, in the benchmarks you pointed here, I saw that C# got faster than C in one case?!!?? How it is possible if C# is not native language? Take a look in "Factorial"

C 0:00.49 real 0.48 user 0.00 sys
C# 0:00.46 real 0.43 user 0.00 sys

I dont think it is possible... can you ask there to see if it is right? And why in this case C# got faster than C?

Thanks

Greg Copeland said...

@enzo as for the c and c# bench, I can't be certain but I assume its because the c coded was compiled without optimizations. Also doesn't c# have a jit? If so, its not terribly uncommon for a jit to beat unoptimized c code.

Please keep in mind my focus was on python and beating ruby/lua vm implementations. I did not investigate the other vm implimentions or associated benchmarks. As such, I'm simply offering opinion and guesses. ;)

Enzo said...

@Greg

Yes, I understand you, but I just like to analyse the root of the things...

Also, as you seems like me, I believe that you want performance at all costs, but keeping python productivity, is not?

So, I would believe that you need to find from where performance comes. To me, C and C++ language is very OLD compared to C#, how it would be possible to a well developed language be beaten by a new language that is not even native? It dont sounds weird to you? A fatorial problem is very common in programming field, so, it is not likely to dont be optimized by any C/c++ compiler... even with none optimizations selected I think it would need to be faster than C#...

Also, If I had the contact of rx-core.org guy, I would ask him for details on how the codes was made and executed... to check why this happened


About this "vm implementations", I dont understood yet, it is VM - Virtual machines? What it is? If it is Virtual machines, I cant believe that exist virtual machines done in python, or even in Ruby?!?!?!

Greg Copeland said...

@enzo I checked, c# sits on top of CLR which has a rather nice jit/gc implementation. Because of how the forth (retro) vm works, it would be rather easy to believe the code path had been jitted. Jits, for many corner cases, are able to perform on par and even faster than some hand written c (remember, jits can apply some optimizations and even specializations which are impossible for static compilations). Its really all about optimizations. In this case, we are comparing non-optimized machine code (c) against highly optimized machine code (c#). So really, if you want to be shocked, be shocked the c# implementation wasn't a lot faster than it was. Again...assuming I'm right about optimizations explaining the delta here.

As for the vm's, yes, they are. Forth is a stack based language. As such, the vm is stack based; with lots of push'n and pop'n go'n on. The compaisons are of running forth applications (retro vm byte code) in a vm (virtual machine) in each of the given implementation languages. So for the python implementation, we have a forth vm running on top of a python vm, implemented via cpython. So simply saying factorial is a common problem is waaay over simplifying what actually happens at runtime.

Btw, all code is available at the site. So you can peek at it all. You don't have to guess.

Enzo said...

@Greg
Thanks, Now I understood...

srepmub said...

@greg: thanks for the comparison! note that 'shedskin -b' should give a bit of extra performance.

I didn't get around to looking into generating switch statements yet. while it would probably help for non-integer cases, GCC appears to generate quite efficient code for this program.. manually changing the output to use a switch statement barely makes a difference in any case.

there are some problems with supporting struct:

-it won't work with non-constant format strings, because type inference cannot work otherwise. not a big problem and can be worked around in most cases I guess.

-shedskin doesn't support tuples with mixed types of elements and length > N (2, currently), which would be necessary to generally support struct.unpack. I guess we could demand tuples are always unpacked right away in this case (a,b,c = struct.unpack('ics', s).

-because types depend on format strings, we'd need a few minor hacks in the type inference engine to look at these (constant) strings, but nothing complicated I think.

I may have a look at this for 0.8..

Greg Copeland said...

@srepmub Thanks. To be clear, I was just making a wish. You likely have other, higher priority items to look at.

Its might be possible for type inference based on format strings, since it is qualifying type, but the tuple issues may be a show stopper.

Might you have a peek to see what is required to support Pyro on shedskin. IMOHO, Pyro uses the struct module in a fairly typical use case. As such, if Pyro can be supported, again IMO, you're supporting a wide range of stuct usage patterns.

Pyro also requires support for pickle, but that's a different issue. Just the same, the pure python pickle implementation might be of value for shedskin. But, this may quickly turn into a rabbit hole...I dunno.

Regardless, you may want to hunt for apps and libs which make use of struct to see how and why people are using it. That alone may given some insight to see if implementation may be worth it and/or feasible; assuming a subset may be required.

srepmub said...

I think we could support most usage patterns, with at most a little bit of rewriting. in any case, it wouldn't be a lot of work to add support for struct this way I think, and it's perhaps the most requested module.. :-)

pickle won't work at all I think, because of type inference issues, but fortunately as of shedskin 0.6 ,you can (un)pickle 'compiled' objects from CPython (after generating an extension module with shedskin -e).

srepmub said...

alright, I have just uploaded the files for 0.7, including a new windows package.. it's a bit of a large download for now. I can probably trim that down for 0.8.

Enzo said...

Thanks very much!!!

I really dont care about the size, I have downloaded packages of different types of tools with more than 200Mb.

srepmub said...

it's my pleasure. please let me know if/how it works for you, preferrably via the googlecode site (issue tracker or discussion group).

the size can probably go down to around 5 MB, if I decide to strip out some unused parts..

Enzo said...

This new version of shedskin is supposed to work on what python version?

srepmub said...

it should run with 2.4 up to 2.7, though it behaves like 2.6 (so code to be compiled should run with 2.6).

Enzo said...

Ok, Right, I installed the python you pointed me...
First, I tried to run from shedskin.bat and it cant find python.exe... so, I changed it to point the right directory.

Tried again, and it cant find __init__.py, so I had to point the full directory....

I tested with test.py, and it seems gone OK, withou any special option used. I only dont know what is "builtin.hpp"

Well, I will have a try with my own codes to see what happens...

srepmub said...

after running init.bat, you should be able to run 'shedskin test.py' right away (assuming python.exe is in your path or in c:\python26 or similar).

Enzo said...

Yes, I tried init.bat, and it seems to have no effect in my Windows, I am using Win7... I think, it would need to be slightly different to work here...

srepmub said...

hm, I've never tested under win7.. I will see if I can find a win7 PC tomorrow to have a look.. :-)

Enzo said...

Thank you very much... but I solved this myself... it is quite simple... the only one downside is that I am not using relative paths...

Also, do you know if the codes compiles well in MSVC2010?

srepmub said...

there were some patches recently to improve compatibility with MSVC (for example, shedskin -v generates a microsoft-style makefile), though I have no idea what the current status is. there's probably some work left to fully support especially the 'os' module..

Enzo said...

Well,

I tested shedskin in my current python codes with no success, there are some problems:

1- My program are done by hundreds and hundreds of python files. How can I process in bath mode all the files at once? One suggestion of mine is, if possible, it would be interesting to specify a folder that have the files, so, shedskin could scan and process all.

2- My program uses some external python libs, and shedskin asked for it, but I dont know how to solve. I think it would be interesting if I can point one path that have all the libs, and shedskin could scan and use them when necessary... but, I think it implies in compiling the libs to C++, and finally it implies in the need of bath mode processing.

3- Yet about the external libs, I think it would be interesting if shedskin could find all missing dependencies and tell me all things I need to do to solve.

Maybe there are some other things I am missing... maybe the naming conventions... but I will leave it to the next time...

Thanks.

srepmub said...

hm, I don't think a batch mode belongs in shedskin. why couldn't you write a simple batch process yourself..? (like, 'for file in directory shedskin file')

I'm also not sure what the dependencies are you mention. shedskin only supports a subset of the standard library, in any case. if you need more, you may be able to use shedskin -e to compile only part of your code.

Enzo said...

These libs I mention some I done myself, and some I got on the net...

To be exactly, the whole source code of my app have more or less 3Mb of sources...

Well, I will wait until shedskin have some more features, to be possible to use in production environment...

Thanks.

Enzo said...

"shedskin -e to compile only part of your code"

I forgot to say, I will try this, but, the way my code is done, with to many files and dependencies to external libs, makes it hard to get a C++ code in the actual shedskin...

I am analyzing the possibility of breaking my codes in something more modular to be possible to get c++ more easy... but I think this task will take me 2 or 3 months....

Maybe it will more easy if shedskin had more support.... so, I will wait for a new shedskin version...

Thanks for your patience and support!