Anomaly detection
Problem motivation
Gaussian distribution
Gaussian distribution: Say . If is a distributed Gassian with mean , variance
Parameter estimation:
, whether use or make very little difference.
Algorithm
Density estimation
Anomaly detection algorithm
- Choose features that you think might be indicative of anomalous examples.
- Fit parameters
- Given new example , compute :
Anomaly if
Developing and evaluating an anomaly detection system
Whem developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm.
Assume we have some labeled data, of anomalous and non-anomalous examples.
- Training set (normal examples)
- cross validiation set (labeled examples)
- test set (labeled examples)
Can also use cross validation set to choose parameter
Anomaly detection vs. supervised learning
Anomaly detection | Supervised learning |
---|---|
Very small number of positive examples; Large number of negative examples | Large number of positive examples and negative examples |
Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look nothing like any of the anomalous examples we've seen so far. | Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set. |
Choosing what features to use
Non-gaussian features: make your data more like Gaussian.
Error analysis for anomaly detection
- Most common problem: is comparable (say, both large) for normal and anomalous examples.
Create some new features. - Choose featrues that might take on unusually large or small values in the event of an anomaly.
Multivariate Gaussian distribution
- . Don't model etc. separately.
- Model all in one go.
- Parameters: ,
there are some pics that show the multivariate gaussian look like in the video.
Anomaly detection using the multivariate Gaussian distribution
Original model vs. Multivariate Gaussian
original model:
- manually create features to capture anomalies where take unusual combinations of values.
- computationally cheaper
- ok even if is small
multivariate Gaussian:
- automatically captures correlations between features
- computationally more expensive
- must have , or else is non-invertible