Monday, September 12, 2016

'Query' meta-data on your data sets

[https://github.com/majidaldo/yaml_query]

Problem: Suppose you have meta-data on some data sets and you want to select data for certain attributes. That sounds alot like a job for SQL. But the attributes are not strictly in a table format where you have something filled in for every attribute.  You probably have not even decided (beforehand) what attributes you should have for every data set.

Solution: 'Convert' attribute (meta-)data into tables that SQL can query.

Note: The YAML part is just a convenience since it's expected that the meta-data is persistently stored. The meta-data abstraction is just one-level of nested dictionaries. I also hear a YAML reader can read JSON.

Wednesday, September 30, 2015

Recurrent Neural Networks Can Detect Anomalies in Time Series


A recurrent neural network is trained on the blue line (which is some kind of physiologic signal). It has some kind of pattern to it except at t=~300 where it shows 'anomalous' behavior. The green line (not same scale) represents the error between the (original) signal and a reconstructed version of it from the neural network. At ~300, the network could not reconstruct the signal, so the error there becomes significantly higher.

Why is this cool??

  • unsupervised: I did not care about data with anomalies vs data without anomalies
  • trained with anomaly in the data: as long as most of the data is normal, the algorithm seemed robust enough to have learned the pattern of the data with the anomaly in it.
  • no domain knowledge applied: no expert in this kind of time series provided input on how to analyze this data

More details for the more technical people:
- training algo: RMSprop
- input noise added
- the network is an LSTM autoencoder
- it's a fairly small network
- code: theanets 

And that's my master's thesis in one graph!

Monday, August 24, 2015

Run CUDA applications on CoreOS

[https://github.com/majidaldo/coreos-nvidia]

Use this Dockerfile to install NVIDIA drivers and CUDA on more recent versions of CoreOS. It works by installing the NVIDIA Linux kernel module using plain Linux kernel source (containers see the kernel of the host OS, not the kernel of the container OS).

There are otheDockerfiles that manage this but they ask that you juggle two installations of the driver: one on the host and the other in the container. With the Dockerfile that I've developed, you only have one driver installation to worry about.

Commentary:

I find having to do this a bit hacky and against the containerization philosophy. Having the kernel module loaded from a Dockerfile and then, as a consequence, not being able to have multiple driver versions on the host. But maybe I'm asking too much from Docker's virtualization technique as I don't think it was meant to virtualize such low-level functions of the operating system.

Still, it's not that bad. Being able to use other CUDA-enabled Dockerfiles with only slight modification is great. I can also load and unload the kernel module at will. You just can't have two versions of the module running at the same time which isn't too much of an issue with GPU computing as you're probably going to not leave enough resources for other GPU processes on the (same) host.


Credits:
https://github.com/coreos/coreos-overlay/issues/924
http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/

Monday, August 10, 2015

"Personal Compute Cloud" Infrastructure Code

[https://github.com/majidaldo/personal-compute-cloud]

Problem: Automate computing infrastructure setup
Solution: Docker hosts on CoreOS machines provisioned with Ansible.

I've recently finished coding up a solution to tackle 'personal' distributed computing. I was bothered by the (apparent) lack of a framework to handle the coordination of setting up multiple machines. And shell scripts are messy. Once I learned Ansible, I was not bothered! (It will be the only systems automation tool I will be using in the foreseeable future! yah..Ansible is AWESOME!)


Catering to the Scientific Computing Workflow: However, mere automation was not my only concern. I wanted a seamless transition from what I'm working on locally to being able to bring more computing power from remote machines. Unlike (pure) software engineering there isn't a 'development' environment and a 'production' environment. Now there are a handful of codes out there that can help you provision CoreOS clusters, but that does not fit well with the scientific computing workflow.

Status: Most of the functionality that I had planned has been implemented. However, like all codes, it's a work-in-progress. I'll be adding functionality as needed by my priorities.

Try it out.

Tuesday, May 26, 2015

Use Vagrant FROM Ansible to Automate Hybrid Cloud Infrastructure

[https://github.com/majidaldo/ansible-vagrant]


The Intro

This is NOT about having Vagrant provision with Ansible. This is about having Ansible treat Vagrant as a provider of hosts.

Building on my previous experience with the 'cloud', I still felt like I needed another tool to script and glue the process of getting my infrastructure up. I started out with shell scripts but they quickly got messy as the complexity increased. I knew about all the devops tools out there but I avoided them because I thought they would be too complex themselves for what I wanted to do which is relatively simple. But I bit the bullet on went full-on devops with Ansible.

Ansible is GREAT! I found it suitable for (technically-minded) beginners. However, it still took me a few days to get the hang of it. I had to get a little bit under the hood since it did not do what I wanted it to do out of the box.

I want to setup something like a hybrid cloud where I run some services locally and just bring up high-performance compute nodes on demand and have them talk with my local services. I use Vagrant to setup local virtual machines. Vagrant is great for development environments but when I want to manage and orchestrate several VMs locally (let alone on the cloud), things can get messy.

So, I (further) developed ansible-vagrant to interface with Vagrant from Ansible (solving cygwin problems along the way).


The Cream

You can, from Ansible

  • Set state=(up|halt) for some VM
  • Get a Vagrant host inventory
  • Get a SSH config for a host
  • Destroy VMs

Friday, May 1, 2015

CoreOS cloud-config Generator

[https://github.com/majidaldo/cloudconfig-writer]

After moving my workflow over to Docker I realized that it's not a complete solution. There is still the process that comes before Docker that must be addressed: namely, provisioning and managing machines on a (ugghh) "cloud" provider such as Amazon EC2, Digital Ocean, and even Vagrant. Also, CoreOS has become an important player in the Docker scene, providing just the minimum operating system needed to run Docker in a cluster environment. CoreOS also provides some commonality in the process of provisioning machines on different providers by making use of a cloud-config file; The same (or almost the same) cloud-config file can be used on different providers.

That's good news but it introduces problems like:
- I want to keep this piece of the cloud-config file out of source control
- I don't want to copy/paste between cloud-config files (DRY issues) :
    - this cloud-config file is just like this one but with a different password/user/hostname/IP address
    - this cloud-config file is just like this one but with an added section

So, I made a program to address these issues! Find out more by reading the README in the repository.

Comments are welcome.