Tuesday, May 1, 2012

Checkpointing MEEP

MEEP doesn't support restarting which is a problem for my workflow. I use cluster computing (it was enough of a pain to get it compiled on a non-debian linux) which requires me to specify a time for my job in a PBS script. MEEP developers suggest using checkpointing software. My research into this has lead me through many underdeveloped or abandoned packages. DMTCP seems to be the most developed (of the software that doesn't require a recompile of the code). It seems simple to use, however, I have found it difficult to implement it with a PBS system where the assigned compute nodes can change between runs.
So far I've found it such a bother that I may write a script that gets the estimated runtime from a trial MEEP simulation and use that time to write the PBS script.