Saturday, November 24, 2007

Shed Skin: Call for participation

Okay, so I have built this pretty cool (restricted) Python-to-C++ compiler, that actually works for many not-too-large programs. It currently builds (simple) extension modules, and shows massive speedups for many programs, often outperforming Psyco by a factor. As I recently showed on my blog, it also integrates nicely with Parallel Python. I would have thought the latter would have given rise to at least one comment..

Am I the only one seeing the potential of an implicitly statically typed Python-like-language that runs at practically the same speed as C++? Is it that being ahead of the time makes you completely misunderstood, even by pretty smart hackers? I remember developing something quite like Wikipedia many years ago, and being unable to convince anyone of why this would be a good idea.. Of course that doesn't mean my compiler is a good idea (I wish), but I'm sure not going to give up as easily this time.

So, once again, I'd like to ask for more participation. Many open source users don't realize, I think, how open source projects, especially new ones, thrive on user feedback. Programming is debugging, and there's nothing more satisfying than fixing particular problems users encounter. Sure, I could find most bugs myself, but I don't want to end up in a nut house, and at some point there has to be some kind of community process. Patches are very useful too, even simple ones, as they often trigger more patches by me. Extension module support, for example, started out as a simple proof-of-concept patch sent in by a user.

I'd very much like to take Shed Skin further forward, but I need your help to do so! Please visit the homepage and send in bug reports or join the mailing list and start some discussions. I can also be hired to work on specific features of course :) So hurry to the homepage:

http://mark.dufour.googlepages.com

30 comments:

Nicholas said...

I've been following this project for quite a while, and think it's great -- really exciting work. Right now I'm working on what is basically a model checker for LLVM code, written in Python, as part of my PhD work. I think this sort of code is an ideal candidate for optimisation, either with Psyco or Shed Skin, and when it's in a semi-usable state I promise I will try it Shed Skin on it, and fix any bugs that crop up. :)

Unfortunately, that's a little while away, yet (at least a month or two), but I thought I'd let you know anyway. Actually, having Shed Skin backend to LLVM bytecode, rather than C++, would be pretty cool. Slightly bigger project though!

Paul said...

As you know, Mark, I've been following ShedSkin for a while, although I've selfishly been busy working on somewhat related projects instead (only one having made it to the Package Index, and which I don't work on any more). I guess I'd like to help out somehow, squeezing some work in between all the other things I seem to be involved with, and perhaps there are some relatively easy things for people like me to start off with, perhaps even fairly mundane "donkey work" that would make the code more interesting to other people.

I think many people are interested in the possibilities of ShedSkin, and I could imagine that the problem is one of "positioning" more than anything else: ShedSkin seems to play best in the same field as Pyrex, and perhaps that's the target audience you need to captivate.

Vivian De Smedt said...

Mark,I think I already sent you a mail about ShedSkin but I'm glad to have the opportunity to tell you again how I'm impressed by ShedSkin and how I think it could be a very important piece for the Python's future.

I think most of us like Python for it expressiveness and its ease of use, which make us avoiding C extention, PyRex and even ShedSkin as much as we can. Of course it is as much as we can and once we can't anymore we are more than happy to have tools like ShedSkin that let us keep coding in Python in front of a performance problem.
So I agree with Paul if you could market ShedSkin as a way to boost part of Python scripts you will convince people.

Maybe improving the deployment, the os support or some tutorials and marketing materials could help you to captivate the audience you deserve.

srepmub said...

hi guys,

thanks for your support! yeah, I admit I absolutely suck at marketing. any help in setting up a better website, adding a tutorial and such would be greatly appreciated.

but I don't need a _lot_ of feedback. in its current stage of development, the compiler is probably not ready for hordes of users anyway. one bug report every few weeks or so is enough for me to happily continue hacking..

@paul:
it would be great if you could help out. of course just trying out things and reporting or fixing simple breakage is very useful. it will probably also lead you to other low-hanging fruit, such as missing functions in sys and os.. you'll also notice some things that can easily be optimized further in the builtins or in generated code, etc.

about pyrex: yes, shed skin should support custom classes. and probably the recursive copying of arguments can be done a bit smarter. if anyone would like into this, that would be great..

@vivian:

sorry I never looked deeply at your visual c++ modifications! I'm still quite interested in any performance gains over g++ though.. :P and thanks for your encouragement!

Alec Thomas said...

It's a bit of a chicken and egg situation for me. I've come across situations where I'd like to use shedskin, but it doesn't have the modules I need ported yet. I'd like to contribute by helping port modules, but I just plain don't have the time :(

There are some things you could do to improve the accessibility of shedskin though:

* Break up ss.py into a more modular form. A 7K Python file isn't that inviting :\
* Use distutils to build and install ss.
* Court Python game developers. Performance is critical for them, and shedskins ability to gracefully fall back to standard Python is nice.

Finally, don't lose heart, shedskin is a very promising system. Good luck.

VanL said...

I have been watching Shedskin with interest for a while. Given the similarities between your python-simplifying restrictions and the restrictions in PyPy's rpython, I have been wondering if Shedskin could work as a C++ backend for PyPy. IIRC, the main difference is exceptions, which are supported by PyPy but not by Shedskin.

Ian Bicking said...

Packaging, packaging, packaging!

I thought I'd give it a try, and met a bunch of problems:

* SF downloads are annoying. You also don't link directly to the download page.

* It's not a real setup.py file. It doesn't say what it actually does either; at least a print in there would help.

* There's no good example that lets me run it quickly.

* I grabbed the example programs (why not just include them in the main download?), and wasn't sure how to run it.

* I didn't install libgc-dev to start, and I'm not sure if I should have gotten an error or what. I didn't get any errors. Now I think I didn't actually get around to compiling anything.

* Lazy me wants ss to run make on its own.

* I just generally hate SF (and CVS). Google Code instead?

These packaging and general polish issues are *really* important to takeup. Lazy people like myself who are just looking around for ways to make things faster stop when they find the first thing that works. Those lazy people actually do become useful contributors sometimes. So getting them someplace successful very quickly is very important.

I'd also suggest some really quick documentation on how to write an extension module, that takes the user through all the steps from no knowledge at all. An extension module seems like the most useful way to use ss, since you can easily work around its limitations with a Python wrapper around your fast ss code.

If I really wanted to use ss seriously, I think I'd also want a clear way to distribute and install my work. This probably involves distutils hooks. Cython would probably be a good thing to look at for that. Probably I'd want to distribute the .cpp files, and have the python setup.py build command regenerate them if possible, but use them either way (so people without ss installed can install from the translated code).

srepmub said...

@alec:

thanks for your comments.

yeah ss.py is a bit big, and I probably need to split it up, but it's actually quite modular, in a way. most code is spread over two major classes (one for generating a constraint network, and one for generating code). then there are two major functions that do the inference work in between.

which modules would you most like to see (better) support for?

@vanl:

I'm sure pypy can be made to generate code that uses the shed skin python builtins, but I'm not sure what the implications of that would be..

shed skin supports exceptions, but they are not always thrown. e.g. out of bounds exceptions have to be enabled via a command-line option, and there are still a few missing (e.g. division by zero)

srepmub said...

@ian:

thanks for taking the time to write down these issues.

regarding most of them, I guess you could have just read the README and homepage. these explain how installation works, how to run the units tests and example, how to build an extension module, integrate with parallel python etc.. but I get your point.

philhassey said...

Hey,

I'm still following it too... I've got to agree with Ian though, packaging is HUGE. I'd say that'd be the #1 thing to work on right now :)

Phil

srepmub said...

okay, okay, I'll see what I can do for the next release. alec sent me a distutils patch, and I'll have a good look at that. I've also been wanting to move to google code. and I've heard some other good ideas here that I'm willing to try. thanks for all the comments!

Luis said...

Hi Mark,
As you know, I'm very enthusiastic about Shedskin. But if it wasn't because I regularly visit your blog, I wouldn't have known about your new post.
IMHO, the problem comes down to a few factors:

1)Marketing
2)Marketing
3)Marketing

As for the nature of the project, I think it would be better if you concentrate in making it a specialized tool for writing extensions, instead of a full fledged python-to-c++ compiler.

And finally, don't be shy!
Don´t hesitate to announce your progress in comp.lang.python and other places.

Another detail to consider is the competiton fro Pypy. Perhaps you should stress the fact that shedskin is already quite usable, and that the effort of just one heroic coder delivered more than a multitude of programmers with millions to spend.

Don´t give up!!
Luis

Paul said...

I've just sent a mail to Mark about Debian/Ubuntu packaging. Interestingly, there's already a /sbin/ss program (provided by the iproute package) which conflicts with the usual program name, but I suppose /usr/bin/shedskin is a reasonable alternative (instead of /usr/bin/ss) and provides a more obvious program name.

If anyone knows the special dance required to get packages in the Debian/Ubuntu package queues, perhaps they can let us know. ShedSkin could get a bit more exposure that way.

srepmub said...

hello luis,

thanks again for your support! :)

I am completely focused on enabling people to write extension modules in Python, as opposed to trying to support as many Python features as possible. More features can always be added later, and at the moment I think enough are supported to be quite useful. The main reason I'm not going to 0.1 is that extension module support is not complete. We really should have custom class support and get rid of most recursive copying. If anyone is interested in looking into this.. I'm guessing most necessary tricks can be learned from libraries such as SWIG.

About PyPy and RPython, you say I've achieved more. It depends on how you look at it. Shed Skin's type inference is probably more advanced, but also less scalable atm. The C code generated by PyPy is also typically also a bit faster (there is some overhead for C++, esp. with wrapper classes for the Python builtins, as I use it). But I don't think a factor 100 or 130 makes for a big difference. And RPython probably supports more features, and much of the standard libraries. But I've never really looked at it well. One thing for sure is that Shed Skin is really small and specialized compared to PyPy. And it's also more elegant in several ways, e.g. it produces relatively readable code and it reuses many C++ abstractions (OO, exceptions, templates..). Of course elegance is worth little if it's not also practical.

In the marketing department, several users are helping me with packaging things:
-alec sent me a distutils patch. this will probably make it in for the next release.
-paul sent me a debian package patch. this is really exciting, as it makes it really easy for most users to install shed skin (windows and debian have a large market share)
I will also move to google code hosting, also before the next release, which among other things should make the discussion group more prominent.

thanks again luis!

Dan O'H said...

I only heard about this project 20 minutes ago, when it got mentioned on a planet python post. Doubt I have the time or skillset to contribute anything, but it certainly sounds cool. I'd probably spend some time investigating it if I had a project where I needed python to run fast.

Dan O'H said...

I've now looked at it, and it seems pretty decent. But Ian's right - it's all about the packaging. e.g. at present, you need to build it in the directory that's going to be it's eventual home. So when I do what I usually do - untar the package in /tmp and build it from there - I'm gonig to end up with something that stops working as soon as I reboot.

Yes, I realise it's not that hard to work out how to make it install properly - but you have to assume all your users are stupid. And if they hit a bug during installation, they'll likely give up there and then, because they don't yet have any evidence that your code is worth their time.

Unknown said...

I really think the first thing you need to do is split the ss.py into more classes and functions. It might be modular enough for you but not for everyone else.

People start reading code piece by piece, not by search over 2 huge classes with endless methods in them.

The way to split things up is to make independent things into different classes, functions. If some method does not depend on other part of a class, they should not belong to that class.

It need not be one class per file but please make classes smaller.

Yamimoto said...

Hi, I will try to help. My knowlegde about C++ is very poor though. Where should I start with?
I can code on python and I understand your "RPython" limitations.

I got impressed with pystone gains, with g++ on linux I get around 40x!! and pygmy test went form 2m45s to just 4s (40x too!) and generated C++ code is really readable!!
Nice work!

srepmub said...

the best way to help out is probably to just try out different smallish programs and see what breaks, is still missing, or could be further optimized. then if you can't fix the problem yourself, to report it to me or to the mailing list.

it will really help if you know C++, of course. most things I can think of right now are on that side. the thing holding back 0.1, as far as I'm concerned, is that the extension module support is still a bit naive and incomplete (custom classes are not supported atm!)

Unknown said...

My own experiences with SS was deceiving.

I have a very simple python script which scanned a directory for .hpp and .cpp files to add license headers to them.

Sure SS understood the "import os" line, but the actual methods in the os modules in C++ are not implemented.

For me, being able to drop the python interpreter while distributing (small) program is clearly THE feature, not really the speed bump.

If you want speed for a project, you start in C/C++ and not with a VM.

So, sorry to crictisize, but SS will get interest when the basic modules will be implemented (like os, sys, etc...) entirely.

srepmub said...

which os/sys methods would you most like to see support for? did you encounter any bugs?

note that shed skin is really still quite experimental (version number 0.0.25!), so typically arbitrary programs don't work out-of-the-box.

srepmub said...

of course improving support for these modules is an excellent starting point for helping out.. :-)

Paul said...

I wonder if there isn't some merit in "unbundling" Python's extension modules and built-in types: strip out the CPython boilerplate, fix up some of the internal mechanisms, and put the boilerplate in some kind of "skeleton" file. In fact, it would be interesting to drop the boilerplate completely and see if it were possible to generate it using a wrapper generator. Perhaps this is too ambitious or unreasonable, but it could be an interesting experiment.

Unknown said...

I had some trouble interfacing external libraries. Somehow, having to edit 3 files (the .py for type inference, the .cpp for the wrapper code, and the .hpp for the declaration) was too much to get going.

Chances are I was doing something wrong. Like, say, I have a .so with a function blah(int, int).. shouldn't it be enough to only have a .py file for that, and not manually edit a .cpp and .hpp with a wrapper, just to call it from my SS program?

And for what it's worth, I'd also agree that a file with over 6000 lines is quite scary to look at, and the packaging (especially how to deplay ss in my own project with project-specific library wrappers) needs work - I ended up copying the whole lib folder and the FLAGS file.

srepmub said...

it's often possible to generate a skeleton for the ?pp files using SS. create a .py file with a function 'blah', convert it, and move the resulting ?pp files to the lib/ dir. that way, you have something working at all times.

about packaging, yes this could be much better. but note shed skin's version number :-) just doing type inference, code generation, implementing builtins and basic libs is already a huge amount of work. packaging patches are very much welcome ;)

srepmub said...

@elias & paul:

yes, there is clearly more that can be automated :) it should be possible to only need type inference .py files to interface with external Python code (e.g. pygame), and to need nothing for external C/C++ libs (all type information is in the .h(pp) files).

I'm repeating myself, but there is only so much I can do as a single developer.. if I don't get any significant help, my goals for 0.1 are to get extension module support in better shape, to have better packaging, and to have more or less complete support for these basic modules:

random (done)
os (this one probably needs most work)
os.path (working on it)
sys
socket (a google ghop student is looking into this)
string (done)
time
math (done)
getopt
copy
bisect
(also fnmatch, stat and cStringIO)

if anyone wants to work on re or datetime, please be my guest. if there are any other requests, please let me know!

srepmub said...

@xryl:

in the meantime, I fixed a problem with os.listdir and implemented most of os.path (the posix version, not yet the windows one!), by running Shed Skin over the Python implementation. is that enough for your program to run? what else would you need? yes, full support for os, os.path and sys is on my list, but I cannot add everything at once.

Mike Klaas said...

Mark,

We corresponded a bit in the summer, and my thoughts at the time echoed what many people here are saying. A sparsely-commented, 6500-line .py is so overwhelming to be almost useless. Further, I found the process of implementing stdlib modules to be unintuitive and cumbersome. I _really_ wanted a system in place that would do several parts of the job automatically--perhaps generated from a repository ss-compiled .py files.

Also needed are an online manual and/or mini-tutorials that walk you through these basic ways to contribute.

I think the concept of the project is really cool, but the feeling I got when examining the project was mostly _discouragement_ due to the organization of the project. You say that the python code is modular, "in a way", but things like a 6500-line file look sloppy to a professional programming. And, to be frank, it is. I'll admit to feeling somewhat reluctant to contribute to something that is so poorly organized.

These issues are things that are hard to get people to help with. Build systems, automation, document, and refactoring are boring to get right. Like Paul, I don't have much time to contribute, and like Ian, I'm lazy. Unfortunately, those two qualities are true of many good developers. Build it and they will come.

Good luck!

srepmub said...

hi klaas,

thanks for the constructive comments. I agree ss.py should be split up. so far not splitting it up has been mostly useful for me while editing it. I will it into at least 4 or 5 files for 0.0.27. for 0.0.26, better documentation and a debian package (thanks to paul) are in the works.

note that ss.py, regardless of splitting, will still be a complicated beast. doing global type inference on python is just nasty business.. ;) and of course ss.py is only one part of the project, and the most difficult to contribute significantly to.

helping out with standard library module support, extension module support, autogenerating bindings, improving the documentation, sending in bug reports etc. is easier and arguably more useful atm.

zahari said...

Hey Mark,

I am really intrigued by your project. I was looking long enough for such a thing and had been finding nothing useful. But I bumped into your project and tried it with a random simple program of mine and it compiled OK. I even didn't know it made a make file :) but it executed with segmentation fault :( due to maybe dynamic cast warnings it gave me during translation. But really great achievement you have here!

I want to know how can I join the project? I am currently a Python developer and I like C/C++ coding too. I am from Bulgaria by the way :).

Looking forward to your answer!!!

Cheers!