Friday, February 20, 2015

Serving GTK3 Applications on the Web

....Scientific Python on the Web: not there yet!

It started with a simple goal: Put my interactive matplotlib desktop application on the web so that others could interact with it easily. I succeeded in the end but it was alot of work which pushed the boundaries of my technology knowledge and programming skill. Since it was alot of work, I'll organize the story of my journey into steps:


0. Try out different (existing) ways of putting matplotlib on the web.

I tried IPython errr Jupiter with different matplotlib HTML5 backends. While certainly cool, none of them implemented GUI functionality. That is, I would get an error because they did not respond to mouse clicks. I'm not starting with zero to be nerdy: Had I gotten IPython to work with mouse position clicks I would not have bothered with developing my solution explained in the following steps. This is also considering that IPython is a poor fit since it's a document while my program can be considered an 'app'.


1. Use a matplotlib backend that supports a GUI that can be displayed in a browser: GTK3

After some research, I settled on using GTK3 which comes with the broadway display server which can serve (individually) the application in an HTML5 page. After playing with GTK3 on Ubuntu Linux I managed to get my program to display in the browser as well as any other GTK3 program such as gEdit. It required a little change in matplotlib.


2. Serve the same program to any user who requests it

This was the most difficult part. The same application needed to be served on demand multiple times, perhaps simultaneously. Just pointing users to the display server is not sufficient. The (normal) overall process needs to be: 1. user requests application 2. display server with the application is started 3. monitor connection to disconnect and clean up after user exits.

I broke down the problem into 'modules' that represent each of these three activities though I worked on them in the order below:

  • 2) Display manager: Working on the display manger came naturally after being able to just display GTK3 applications (step 1). It manages the starting up and shutting down of the display servers and the application that run on them. This code is pretty independent of the other modules although I put in functionality keeping in mind what the other modules need to do.
  • 3) Connection manager: I used websockets to periodically send out a request for the user to confirm that he is active. This happens by simultaneously executing python on the server and javascript on the client. On disconnecting, the display and its associated application is stopped.
  • 1) Request handler: It starts the process of starting a display when the user requests it.


Of course, that's easier said than done. The challenge in coding the display manger was managing the Linux processes and cleanly killing them. The connection manager had me go into event-driven programming and websockets. The request handler was rather straightforward to implement. But I had to extract the javascript from the broadway server to integrate it with my the process. So the webpage is (actually) served from the request handler and not the broadway server. I used the Tornado web framework to program the request handler and the connection manager. While the display manager is associated with the request handler (in the same program). Making sure events were coordinated was difficult!


3. Deploy

Having heard about about how awesome Docker is, I decided to recreate my development environment in a docker container. It was a bit tricky to get the networking working but I'm impressed with what is possible with Docker. Currently, my container is living in tutum.co while actually being served from AWS. Please volunteer to serve my application!


Reflections: Scientific Python on the Web

I have to say that I'm pretty satisfied with the result even though it's not ideal. As soon as as I got the task accomplished, I stopped working on it even though I could sink alot more time in it to improve quality, usability, and flexibility. Personally, I learned ALOT working on this project.

But I was disappointed that over in the R world you can create apps with GUIs on the web much more easily with shiny. They recently added mouse position clicks. I say it's disappointing because python is associated with multiple application domains including GUIs. Now in my research for solutions, I found ways to integrate matplotlib into GUIs but that would require some GUI expertise. There is no obvious solution for people used to the scientific python stack as to which GUI framework to use if web publishing is a concern. The people used to the scientific stack are not GUI experts nor are they web developers.

Having said that, what I like about my solution is that there is a path from matplotlib to a full GTK3 GUI application. So you can start with (simple) matplotlib elements and then if you decide you need the functionality of a real GUI you can integrate your work into the GTK3 framework. I've tried it. So you could have an app that runs on the desktop as well as the web. That is superior to shiny.

Some people have commented on the state of scientific python on the web. As part of the solution, I think somehow documents (html, IPython) and applications (GUIs) need to merge. The web has become a medium to deliver experiences.

Unfortunately, for delivering interactive python on the web, there is still alot of work to be done. But just by myself, I was able to deliver a product, albeit hacky, that serves python applications on the web using open source: Docker, Tornado, scipy/numpy, matplotlib, GTK3, Ubuntu, Linux...etc. Imagine what would happen if the open source community came together to work on this problem. Some components exist but now it has to come together. Hopefully, an open-source solution can be superior to a proprietary one.


---
Introduced at DC Python meetup.

Thursday, February 19, 2015

Bayesian Optimization Demo Game: Can you beat the 'computer'?

...an interactive matplotlib (GTK3) app served on the web.
app (not working properly on Chrome): [http://bayeisan-optimization.25800bfd.svc.dockerapp.io:8000/bo]
[https://github.com/majidaldo/GTK3webserver]


I recently made a notebook about Gaussian processes (GP). Now, Gaussian processes are used by Bayesian optimization. In Bayesian optimization, the goal is to optimize a function with as few (probably expensive) function evaluations as possible. Here's a good tutorial.

To see how it works, I implemented a basic optimizer following the mathematics introduced in the referenced tutorial (which wasn't as difficult as the math behind the GP!). I began from the code for the GP, to see how a Bayesian optimizer (BO) would optimize a toy 1-D problem.

But then, I wondered how the performance of a human would compare to the BO. So I made a game! In this game, the goal is to, of course, find the maximum of the function is as few tries as possible. Both players in each trial will attempt to find the maximum. A running average of the number of tries is kept.

After some experimentation playing the game, it seems there is a lower bound for the average number of tries needed. For the way I have my game set up (50 possible choices), the BO needed 10 point something +/-  point something tries. Furthermore, trying different so-called acquisition functions for the BO did not have much of an effect. This figure is based on thousands of trials (central limit theorem at work). And playing as the creator of my game, it was difficult for me to keep below 10 tries as an average. Of course, I can't play thousands of times.

These results imply that a human would not be able to optimize a function in three dimensions or more in less tries than a BO. There are simply too many variables to keep track of and it is very difficult if not impossible to visualize which is part of the point of using BOs.


Some features of the program:

  • For aesthetic reasons, the y-axis has to be fixed based on the range of values of the function. But then the visual clue of having a point close to the top edge of the plot gives the human player an unfair advantage. The solution is to add some random margin between the maximum and the edge. So, if you happen to get a point that is close to the top edge, choose its neighbors!
  • To generate a 'random' smooth function, I get some normally distributed points and sprinkle them on the domain. Then I use splines to connect them. It works surprisingly well!

Implementation notes:

The programming started in a functional style which is what you'd expect out of mathematical code and it's what I'm used to. However, once I got into matplotlib's GUI stuff, things started to get a little messy as a GUI requires reactive event-driven programming. Both styles of programming exist in my program and they intersect at while loops. You can run the game script with python on your desktop from the command line.

Now, the work involved in putting the game on the web, so that you, the reader, can easily engage in it, deserves its own post or two! Once I had the 'desktop' application running, I swam through oceans of technology until I got it to the form presented here. I should mention here that I came across a program similar to mine on Wolfram|Alpha but I can't find it anymore.

So here is the game served on the web. It's a Docker container managed for free, for now, by tutum.co but hosted almost for free on AWS for a few more months as of the date of this post. If you can spare 1GB of HD space and 100MB of memory to host my program, let me know!