Monday, January 19, 2015

Learn About Gaussian Processes


Gaussian processes (GP's) are awesome! Not only can you approximate functions with data, but you also get confidence intervals! (Gaussian processes are an example of why I think some statistics knowledge is fundamental to data science.)

I was motivated to learn about GP's because I wanted to learn about Bayesian optimization because I wanted to optimize neural network hyperparameters because I wanted to reduce training time and find the best network configuration for my data.

However, despite being mostly trained in science and engineering, it took me a while to get an understanding of GP's partly because of my not so rigorous statistics training. Add to that that I'm used to analytical formulas being used to describe functions...not some statistical distribution. So the main point of this post is to share what I did to learn, at least the basics, of GP's in the hope it helps others.

I went through the following materials to learn the basics of GP.

Main material:
You might think that diving right into (textual) documents introducing GP's would help you understand GP's more efficiently. But, I found that starting with video lectures was a more productive use of my time (in the beginning). "mathematicalmonk"'s and Nando de Freita's videos are top video search results. While they are great videos to learn from, I found that David MacKay's video is better because he makes effective use of computers to learn the math and gives better insights!

Support material:
- linear algebra operations and numerical considerations
- statistics: REALLY make sure you understand the difference between marginal, conditional, and joint distributions in a more abstract sense. Then you should be able to follow a derivation of the conditional distribution of a multivariate Gaussian distribution.

Exercise: IPython errr Jupiter Notebook:
I took a script that calculates GP's from Nando de Freita's machine learning course and made it into a document attempting to explain details of the code. Please comment regarding my shortcomings in fully understanding the code.

Finally, a remark about NumPy (as well as other matrix-oriented code) after going through this code: Now I love NumPy but trying to decipher NumPy code that is anything but straightforward matrix operations can be stressful! Writing a perhaps more verbose but simpler code makes such code easier to follow. And when speed is an issue for this simpler code, you might want to reach for Julia or Numba...but that's a separate discussion!