Problem Formulation
- Geiven a set of training data
- We want to find a function detecting input x is similar to training data or not.
- another names: outlier detection, novelty detection, exceptions detection
- different approaches use different ways to determin the similarity
Applications
- Fraud Detection
- Network Intrusion Detection
- Cancer Detection
Binary Classification?
- anomaly cannot be considered as a class
- Even worse, in some cases, it is different to find anomaly examples
Categories
- With labels: The classifier can output "unknown" (none of the training data is labelled "unknown")--> open-set recognition
- without labels:
- clean: All the training data is normal.
- polluted: A little bit of training data is anomaly.
with labels
- The classifier output class y and confidence score c.
- confidence: the maximum scores of the output of softmax layer or negative Entropy.
- Comment: GAN有可能会生成normal samples