Thursday, September 12, 2013

Simple plots can reveal complex patterns

Visualization is a big topic on its own, which implies that you can get quite sophisticated in making plots. However, you can reveal complex information from simple plots.

I took a shot at visualizing power generation data from the Kaggle competition. My goal was just to make a "heat map" of the power generation data: for every <week, hour of week>,  plot the power generated. Now, I had to rearrange the data a bit but the result was not only pretty, but more importantly, very revealing and efficient. The plot summarizes ~25k data points by revealing cycles over days and months over several years.

Enjoy. Courtesy of pandas for data munging and matplotlib for the plot.


  1. Dear Majid
    This sort of plot looks great! Recently I began working with Python (used Matlab before) and tried to create a plot similar to this one. But I ran into problems (array size) when I tried to group my data in days and hours of the day with the lambda x-function.
    So my question is how you managed to arrange your data, especially which type of data did you use? I used a dateframe that I fetched from a MySQL-Server. Could you give me some advise or even share your code as a iPython Notebook?
    Thanks in advance

  2. I could post my code if you insist but I'm not sure it would be that useful to you (my input data was not arranged as nicely with a timestamp!) Your main difficulty is in arranging the data, and for that the solution is in pandas timeseries indexing
    In my case, once I had my hourly data, I was able to iterate by week.

    As for the plot, I've used matplotlib to plot a million points without much trouble.