Thursday, September 12, 2013

Simple plots can reveal complex patterns


Visualization is a big topic on its own, which implies that you can get quite sophisticated in making plots. However, you can reveal complex information from simple plots.

I took a shot at visualizing power generation data from the Kaggle competition. My goal was just to make a "heat map" of the power generation data: for every <week, hour of week>,  plot the power generated. Now, I had to rearrange the data a bit but the result was not only pretty, but more importantly, very revealing and efficient. The plot summarizes ~25k data points by revealing cycles over days and months over several years.

Enjoy. Courtesy of pandas for data munging and matplotlib for the plot.

2 comments:

  1. Dear Majid
    This sort of plot looks great! Recently I began working with Python (used Matlab before) and tried to create a plot similar to this one. But I ran into problems (array size) when I tried to group my data in days and hours of the day with the lambda x-function.
    So my question is how you managed to arrange your data, especially which type of data did you use? I used a dateframe that I fetched from a MySQL-Server. Could you give me some advise or even share your code as a iPython Notebook?
    Thanks in advance
    JDB

    ReplyDelete
  2. I could post my code if you insist but I'm not sure it would be that useful to you (my input data was not arranged as nicely with a timestamp!) Your main difficulty is in arranging the data, and for that the solution is in pandas timeseries indexing http://pandas.pydata.org/pandas-docs/stable/timeseries.html
    In my case, once I had my hourly data, I was able to iterate by week.


    As for the plot, I've used matplotlib to plot a million points without much trouble.

    ReplyDelete