Thursday, July 09, 2009

Disco: an elegant Python Go player

There's this shiny new technique to implement computer Go players, called Monte Carlo UCT. It shocked the Go world by powering the first computer player (Mogo) to beat (quite unexpectedly I think) a professional player in an official match (with only a 7 stone handicap). Ever since hearing about this rather simple new technique, my hands have been itching to implement it in Python and compile the result with Shedskin.

The name of the resulting Go player is Disco (think "staying alive"), and its source code of only 350 lines can be found here. Given enough time, it can play some pretty interesting games on a 9x9 board. And of course you can connect it to a graphical Go board, although I only tested it with Gogui. See the README for installation and configuration details.

It has become quite a nice test case/example for Shedskin. Not only does it become about 5 times faster on my computer after compilation, and shows that you can write a pretty fast Go engine using it, it also shows how easy it is to generate an extension module with Shedskin and effectively use this in a larger program. In this case, there is a Parallel Python wrapper that fires up an abitrary number of processes, each importing and using the extension module to do part of the thinking.

Sometimes everything can come so beautifully together: an elegant Python program of only 350 lines, compiled to an extension module, imported in multiple Python processes, connected to a graphical Go board and playing according to some brilliant but simple new mathematical formula that can somehow make computers play Go! Now if only I would be better at Go myself, so I could really test it..

BTW, Shedskin 0.1.2 should be right around the corner. Obviously I got a bit distracted by the above.. In all, it contains many optimizations and bug fixes and adds 5 other new example programs.

6 comments:

apt1002 said...

I'd like to ask you about this code:

if node.bestchild and random.random() < 0.5:
pos = node.bestchild.pos
else:
if not node.parent [...snip...]
else:
pos = board.random_move()

It looks like this plays a random move with probability 0.5, which is quite unusual. However, it seems to play quite well. Where did you get this idea?

srepmub said...

sorry for the late reply, I'm on vacation at the moment.

I'm not sure if this is the correct approach either. it's what I understood from the theory described at the link I put in my posting (ie, the explanation of uct + monte carlo at sensei's library), without looking much further.

I did try some other approaches and messed with some of the constants, but I couldn't get it to play better and/or scale better (in the number of random playouts set with GAMES), at least as far as I can tell. but admittedly I didn't really try that hard. I'd be interested if you or anyone knows of a better approach, especially if it scales better (currently setting GAMES to 500,000 seems to make it play worse!)

so about your question :) note that it doesn't actually play random moves in any way. the 'select' method is used to play lots of playouts each turn (determined by GAMES, default 10,000), during which it equally balances 'exploration' (random move, unexplored moves first) with 'exploitation' (best found move so far). the actual move that is played is the one with the highest percentage of successful playouts.

apt1002 said...

My own effort is described here: http://senseis.xmp.net/?AlistairTurnbull . I can supply source code if you like.

I can see that your algorithm only plays random moves in the simulation phase. What surprises me is that it plays them with a fixed probability of 50%.

The "classic" algorithm is always to play a move you've never played before, if there is one, otherwise always play (what you call) bestchild. The whole point of UCT is that the choice of bestchild automatically balances exploration and exploitation, without you having to do anything else.

I think your algorithm perhaps plays better? We should do a proper test.

I always have trouble at the end of the simulation phase choosing the best move to play for real. UCT balances the search effort in such a way that there's a high probability that the move currently believed to be best is just a fluke due to insufficient exploration. Your over-exploration (relatively speaking) of random moves seems to fix this problem.

Anyway. I would be happy to continue by email, if there's some way I can pass you my email address without publishing it.

srepmub said...

that might explain it, but I don't have much time at the moment to think about this or experiment. I just copied the idea from the sensei page.. ;) might be good idea to ask this in the computer go group?

you can send me your email address as follows, and I will contact you after I get back from my vacation:

''.join([chr(ord(c)-1) for c in address])

srepmub said...

or if that counts as publishing, here is my address:

l`qj-ctentq?fl`hk-bnl

apt1002 said...

My address is: `os0//1?ltorxbg-nqf

I checked again: I can't find your idea on the Sensei's Library page "UCT", or on any page linked from it. Can you give me a pointer?