Wednesday, March 20, 2013

C vs. Python in the Context of Reading a Simple Text Data File

Problem: Using C, read a Matrix Market file generically and versatilely.
Generically: using the same code (at run-time), interpret a line of data in various ways

I'm trying to do this in C. Coming from Python, C makes me feel CRIPPLED. Laundry list of offenses:

  • Primitive string operations...oh actually they are just character arrays.
    example: char teststring[10]="asdf"; (OK initialization) teststring="qwer"; (FAIL what I expect is a reassignment fails..turns out you must call string copy and make sure the destination has enough space.).
  • No negative indexing
  • Can't get length of an array
  • Can't handle types as objects. You can't say: if this object is of this type, then do this. Types are 'hardcoded'. You can't return a type.
  • Case labels must be known at compile-time. The C compiler can't understand that if I use a function output with a known input at compile time to make a label that it could be known at compile-time. As a mathematically-inclined programmer, all I care about is functionality. I don't like to think about the preprocessor as a separate process.
  • Your program can compile and work but in the realm of "undefined behavior". Good luck finding a bug caused by "undefined behavior".
Which brings me to some C items that have overlapping functionality: enum, arrays, and X-Macros. They can all be viewed as lists, but macros live in the preprocessor; you cannot iterate over enums and they are essentially only a list of integers; and if you want to know the size of an array, you can only do it at compile-time with a macro hack.  So if you need functionality on the same list that had to be implemented in the preprocessor and at run-time, you have to violate DRY.

While I do think lower-level functions are essential for some purposes, this exercise makes me appreciate Python more (yes particularly Python and not other interpreted languages). In Python, I can read the lines in as a matrix of strings, then I could vectorize type conversion operations in a few lines of code.

So, I've resorted to code generation to create functions for all cases which solves half the problem. I won't bother trying to make things more generic. If it could be done, I expect that it's going to look messy.

4/17 Update: I've finished my reader. The "main()" file is here. I think I organized my procedure well but it took alot of code. It's ridiculous. I could accomplish the same with acceptable speed in about two dozen lines of Python code. I know this because I've written code to read data generically in Python several times.