Monday, August 24, 2015

Run CUDA applications on CoreOS


Use this Dockerfile to install NVIDIA drivers and CUDA on more recent versions of CoreOS. It works by installing the NVIDIA Linux kernel module using plain Linux kernel source (containers see the kernel of the host OS, not the kernel of the container OS).

There are otheDockerfiles that manage this but they ask that you juggle two installations of the driver: one on the host and the other in the container. With the Dockerfile that I've developed, you only have one driver installation to worry about.


I find having to do this a bit hacky and against the containerization philosophy. Having the kernel module loaded from a Dockerfile and then, as a consequence, not being able to have multiple driver versions on the host. But maybe I'm asking too much from Docker's virtualization technique as I don't think it was meant to virtualize such low-level functions of the operating system.

Still, it's not that bad. Being able to use other CUDA-enabled Dockerfiles with only slight modification is great. I can also load and unload the kernel module at will. You just can't have two versions of the module running at the same time which isn't too much of an issue with GPU computing as you're probably going to not leave enough resources for other GPU processes on the (same) host.


Monday, August 10, 2015

"Personal Compute Cloud" Infrastructure Code


Problem: Automate computing infrastructure setup
Solution: Docker hosts on CoreOS machines provisioned with Ansible.

I've recently finished coding up a solution to tackle 'personal' distributed computing. I was bothered by the (apparent) lack of a framework to handle the coordination of setting up multiple machines. And shell scripts are messy. Once I learned Ansible, I was not bothered! (It will be the only systems automation tool I will be using in the foreseeable future! yah..Ansible is AWESOME!)

Catering to the Scientific Computing Workflow: However, mere automation was not my only concern. I wanted a seamless transition from what I'm working on locally to being able to bring more computing power from remote machines. Unlike (pure) software engineering there isn't a 'development' environment and a 'production' environment. Now there are a handful of codes out there that can help you provision CoreOS clusters, but that does not fit well with the scientific computing workflow.

Status: Most of the functionality that I had planned has been implemented. However, like all codes, it's a work-in-progress. I'll be adding functionality as needed by my priorities.

Try it out.