Anomaly Detection with Twitter in R
Twitter open source there anomaly detection package in R.
Does it really detect anomalies?
What anomaly can be detected?
Its aim to detect anomalies in seasonality time series and underlying trends.
YES! It actually works very well. At least when you use it for what it was created for…
It was designed to detect global and local anomalies.
Global anomaly:
It is the kind of anomaly we are the most familiar with. It’s an anomaly who goes out of the usual interval. It isn’t always the best way, but using the 95 percentile technique can detect this kind of anomaly.
Local anomaly
Very often we can see an underlying trend into our data. It usually looks like a “wave”: low activity on the morning, high during the day, low at night. Local anomaly occur within this context. For example: high activity at night mean anomaly.
First, it aims to detect global and local anomalies (see above).
It supposes to understand “underlying trends” such as an organic growth in the metrics.
Twitter call this algorithm a Seasonal Hybrid ESD (S-H-ESD).
I was very impressed by twitter anomaly detection. It spot many different anomaly case.
Of course it didn’t detect everything. Only what it was built for.
[Anomaly detected]Grow to early in seasonal metrics
[Anomaly detected]Some unusual noise
[Anomaly detected]More noise than usual
[Anomaly detected]Break down
[Anomaly detected]Sudden grow
[Anomaly detected]Sudden grow
[Anomaly detected]Pick
[Anomaly detected]Activity when usually none
[Anomaly not detected]Linear grow
[Anomaly not detected]Linear seasonal grow
What can’t be detected?
Twitter Anomaly detection is impressive. But it isn’t the only way to detect anomaly.
It is built to detect certain kinds of anomaly. Not all of them!
[Anomaly not detected]Flat signal
[Anomaly not detected]No noise
[Anomaly not detected]Exponential grow
[Anomaly not detected]Negative seasonal anomaly
[Anomaly not detected]Negative seasonal anomaly
Conclusion
Twitter made a big breakthrough into anomaly detection.
It detects a wild type of anomalies.
Only two negative review:
To my eyes, it only failed to detect one kind of anomaly “Negative seasonal anomaly” (last graph above)
R is awesome. But not suitable for anomaly detection in real-time
Over all it is an incredible peace of software… Congrat’s Twitter, outstanding job !
Anomaly detection
Anomaly will find common patterns in your metrics after few weeks of monitoring.
It will train itself to detect anomaly such as:
Recurring event
Similar behaviour
Correlation
Trends
and much more…