Chapter 12. Outlier Analysis

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 4

Chapter 12.

Outlier Analysis

Outlier and Outlier Analysis

Outlier Detection Methods

Statistical Approaches

Proximity-Base Approaches

Clustering-Base Approaches

Classification Approaches

Mining Contextual and Collective Outliers

Outlier Detection in High Dimensional Data

Summary
1

What Are Outliers?

Outlier: A data object that deviates significantly from the normal


objects as if it were generated by a different mechanism
Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne
Gretzky, ...
Outliers are different from the noise data
Noise is random error or variance in a measured variable
Noise should be removed before outlier detection
Outliers are interesting: It violates the mechanism that generates the
normal data
Outlier detection vs. novelty detection: early stage, outlier; but later
merged into the model
Applications:
Credit card fraud detection
Telecom fraud detection
Customer segmentation
Medical analysis

Types of Outliers (I)

Three kinds: global, contextual and collective outliers


Global Outlier
Global outlier (or point anomaly)
Object is O if it significantly deviates from the rest of the data set
g
Ex. Intrusion detection in computer networks
Issue: Find an appropriate measurement of deviation
Contextual outlier (or conditional outlier)
Object is O if it deviates significantly based on a selected context
c

Ex. 80o F in Urbana: outlier? (depending on summer or winter?)


Attributes of data objects should be divided into two groups

Contextual attributes: defines the context, e.g., time & location

Behavioral attributes: characteristics of the object, used in outlier


evaluation, e.g., temperature
Can be viewed as a generalization of local outlierswhose density
significantly deviates from its local area
Issue: How to define or formulate meaningful context?
3

Types of Outliers (II)

Collective Outliers

A subset of data objects collectively deviate


significantly from the whole data set, even if the
individual data objects may not be outliers

Applications: E.g., intrusion detection:

Collective Outlier

When a number of computers keep sending


denial-of-service packages to each other

Detection of collective outliers


Consider not only behavior of individual objects, but also that of
groups of objects
Need to have the background knowledge on the relationship
among data objects, such as a distance or similarity measure
on objects.
A data set may have multiple types of outlier
One object may belong to more than one type of outlier

You might also like