Sunday, June 11, 2017

Holoviews Allows for Rapid Data Exploration by Structuring Data

Problem:
Data analysis and visualization are related. You have to set up a new visualization that makes sense every time you want to explore a set of variables. Furthermore, you have to deal with different data formats and plotting libraries.
Solution:
Use HoloViews. It forces you to organize your data in such a way that you can automatically visualize it.

Context:

Making a choice about which plotting package to use in Python used to be simple. There was just matplotlib pretty much. Nowadays, there are a plethora of visualization packages for Python. Jake VanderPlas recently gave an excellent talk at PyCon 2017 that highlights this which is summarized in the below diagram from the talk.


While choice is good, oftentimes I don't need so much choice when it comes to plotting itself as much as I need to be fancy with the process of setting up the plot in such a way that what I'm looking for is revealed in the plot.

This is where tools like HoloViews and Vega/Altair come in. Vega is a plot+data description language. But HoloViews goes a step further in abstraction: you specify relationships in your data and it does the hard work of presenting it in the best way with great flexibility; It's data-oriented, not plot-oriented.

I find explaining Holoviews difficult because most everyone is used to an imperative style of visualization whereas HoloViews can be considered declarative. Once you get past that, you have to distiguish HoloViews from Grammar-of-Graphics-inspired plotting packages like ggplot2 which is declarative as well. The difference is that ggplot declares plots while HoloViews declares data. In fact, to 'get' how HoloViews addresses the data analysis/visualization problem, I had to read the proceeding for it. One unique aspect of HoloViews that can help understand it is that it enforces a separation of data, plot rendering, plot type (given by an appropriate 'view' of the data), and plot style.

This approach to the problem translates into the following advantages:

  • rapid exploration of data (no, pandas doesn't cut it)
  • export of the HoloViews object as a self-contained html file with interactive plots
  • map to data in its original form like numpy, pandas, lists, as well as Blaze which itself accesses a variety of data sources including databases
  • choose your rendering backend: matplotlib, bokeh, plotly
  • memory-constrained analysis using DynamicMap
  • instantly switch between using interactive controls like sliders to explore variables and more static displays like an array of plots for each variable value

Show me one tool that does all that! Python vs R jab: It's been said for years that Python lags behind R in visualization. I would say parity has been achieved once Python got some ggplot-like tools. But with holoviews, I think the better language for visualization has now tipped in favor of Python!

No comments:

Post a Comment