This is just an impression so it's very subjective and I'm biased due to my enthusiasm for Python and my scientific computing background which Python has a firm grip on. But try comparing PyData to Strata which is a higher-level view of the data science world with R being the main analysis tool. The broader data science world is a collision of computer science and statistics both in theory and the tools used. While the narrower PyData world has its roots in the more harmonious scientific computing world where math meets computing albeit the math is generally less statistical.
Until data science curricula are developed, I believe the scientific computing background is a better foundation for data science than statistics alone or computer science on its own. Computational, data, and domain expertise are present in skilled scientific programmers, some of whom attended the conference. The caliber of talent at the conference was astounding. Attendees could, for example, talk about specific CPU and GPU architectures, database systems, compiled languages, distributed systems, GUIs; as well as talk about monte carlo techniques, machine learning, and optimization. Such broad knowledge of all of these areas is important for the implementation of a speedy scientific workflow which happens to be necessary for data science as well.
I'm also going to claim there are more tech-savvy scientists than there are tech-savvy statisticians. This isn't to diminish the importance of statisticians in data science but the computational pride of statisticians is in the comprehensive but slow and inelegant R language. Meanwhile scientific programmers know about, and have built eco-systems around, C/C++, Fortran, and Python and all the CS-subjects associated with these languages including parallel programming, compilation, and data structures. This is just the natural result of statisticians traditionally working with "small data" while scientific programmers often work with "big computation".
The mastery of these issues within the same community is what allows for innovative products such as scidb, bokeh, numba, and ipython and all the various small-scale hackery presented at the conference.
Perhaps I should hold off on making such claims until I go to the next Strata conference but this is a
No comments:
Post a Comment