Majid's Research: tsad

Showing posts with label tsad. Show all posts

Friday, May 20, 2016

Thesis: Unsupervised Anomaly Detection in Sequences Using Long Short Term Memory Recurrent Neural Networks

http://hdl.handle.net/1920/10250

As a formalization of my previous post, I'm happy to publish my (second) Master's thesis.

Wednesday, September 30, 2015

Recurrent Neural Networks Can Detect Anomalies in Time Series

A recurrent neural network is trained on the blue line (which is some kind of physiologic signal). It has some kind of pattern to it except at t=~300 where it shows 'anomalous' behavior. The green line (not same scale) represents the error between the (original) signal and a reconstructed version of it from the neural network. At ~300, the network could not reconstruct the signal, so the error there becomes significantly higher.

Why is this cool??

unsupervised: I did not care about data with anomalies vs data without anomalies
trained with anomaly in the data: as long as most of the data is normal, the algorithm seemed robust enough to have learned the pattern of the data with the anomaly in it.
no domain knowledge applied: no expert in this kind of time series provided input on how to analyze this data

More details for the more technical people:

- training algo: RMSprop

- input noise added

- the network is an LSTM autoencoder

- it's a fairly small network

- code: theanets

And that's my master's thesis in one graph!

---
update 12/2018:
This post has been getting much attention which I appreciate. However, I find myself obligated to point readers to the latest research which obviates RNNs. Here is a great introduction towards the latest research The Fall of RNN LSTM.

Friday, April 11, 2014

PCA and ICA 'Learn' Representations of Time-Series

PCA is usually demonstrated in a low-dimensional context. Time-series, however, are high dimensional and ICA might be a more natural technique for reducing dimensionality.

For my current project, I'm not interested in dimensionality reduction per-se; rather I'm interested in how well, given a reduced representation of some base input time-series, how well the algorithm can reproduce a new input. If the new input cannot be recreated well, then it is a candidate for being considered an anomaly.

I've setup an experiment where I generated a bunch of even number sine waves in the domain as (base/training) input to the algorithms plus a constant function. Then I try to reconstruct a slightly different even sine wave, an odd sine wave, and a constant.

The result is that the even sine wave and constant are somewhat reconstructed while the odd sine wave is not. You can see this in the following graphs where the blue line is a 'target' signal and the green line is the reconstructed signal. I get similar results using PCA.

4 waves

5 waves FAIL

constant = 3

There are plenty of mathematical rigor and algorithmic parameters that I haven't talked about but this is a post that requires minimal time and technical knowledge to go through. However, you can figure out details if you examine the ipython notebook.

Majid's Research