Huge fraction of attendees at #pydata code for a living but not have CS degrees.
— Matt Davis (@jiffyclub) May 4, 2014
Computer science people this post is not for you. Mathematical people that haven't developed programming skill, this post is for you. I'm aiming this post at people who want to get involved in some kind of mathematical computing: scientific computing, statistical computing, or data science where a crucial skill of the job is programming.
I was spurred to write this post by this tweet from Matt Davis backed by personal experience. I don't have formal training in computer science as many software engineering professionals do. Yet, I managed to be at least be functional in programming and conversant with software engineering practice. So I'd like to share my story in the hope that it can benefit others.
When I first started programming, all I cared about was the results that would come out of some mathematical formulation that I wanted to implement (and that's how I was assessed as well). These exercises have pretty much always followed the workflow diagrammed below:
problem/question -> math -> code -> execute -> output -> analyze -> communicate results (feedback loops not shown)
In my case, the pursuit of efficiency, flexibility, and just the 'cool' factor led me down the path of actually becoming a better programmer instead of just someone who wrote alot of little programs (using Python had much to do with increasing my skill but that's another story). I attribute the reasons for the increase of my skill to interest in the following:
Improving program logic:
- Generalization which leads to abstraction of code. How do I make my code apply to more general cases?
- Code Expressiveness. How do I best represent the solution to my problem? What programming paradigm should I use: object-oriented? functional? This is related to generalization and abstraction; and is closely related to code readability and maintainability.
- Robustness. How does it handle failure? How does it handle corner cases? (eg. what happens if some function is passed an empty list?)
- Portability. So I got this code working on my Windows machine. Will it work on a mac? linux? a compute cluster? 64-bit operating system?
- Automation. Eliminate the need for human involvement in the process as much as possible.
- Modularity and separation of concerns. As your program gets bigger, you should notice that some units of logic have nothing to do with others. Put in the effort to separate your code into modules. This aspect is also related to code maintainability.
- High-performance. Can I make my code run faster? As a major concern for scientific computing and big data processing, you must understand some details of computer hardware, compilers, interpreters, parallel programming, operating systems, data structures, databases, and the differences between higher and lower-level programming languages. This understanding will be reflected in the code that you write.
- Do not repeat yourself (DRY). Sometimes, a shortcut to deliver a program is to duplicate some piece of information (because you didn't structure your program in the best way). Resist this temptation and have a central location for the value of some variable so that a change in this variable propagates appropriately throughout your system.
- Testing. As your program gets larger and more complex, you want to make sure its (modular!) components work as you develop your code. Test in an automated(!) fashion as well.
- Documentation. Expect that you'll come back to your code later to modify it. Save yourself, and others(!), some trouble down the road and document what all those functions do.
- Source Control. You need to be able to track versions of your code to help in debugging and accountability in teams. A keyword here is 'git'.
- Debugging. Stop using print statements. It may be ok for quick diagnosis but just "byte" the bullet and learn to use a debugger. You'll save yourself time in the long-run. Nonetheless, you can minimize your use of a debugger by writing modular and robust code from the start.
- Coding Environment. Integrated development environment vs text editor. VI vs Emacs vs Nano vs Notepad++ vs ...etc. Eclipse vs Visual Studio vs Spyder vs ...etc. Read up about these issues.
- Concentration. Don't underestimate the importance of sleep and uninterrupted blocks of time. I find that crafting quality code can be mentally taxing thereby requiring my focus. Also, having a healthy lifestyle in general is also relevant. I like to code while listening to chillout music with a moderate intake of a caffeinated drink.
At first I thought this entry was going to be a joke but on second thought it's really not, even though it's not a technical aspect of the work. See, I didn't boldface this point.
So I haven't revealed anything new here but I hope putting this list together has some value. Also, efforts like Software Carpentry can put you on the fast-track towards improving your skill.
But as with every profession, you must (have the discipline to) practice.