DWDM Unit-3: What Is Classification? What Is Prediction?
DWDM Unit-3: What Is Classification? What Is Prediction?
DWDM Unit-3: What Is Classification? What Is Prediction?
Unit-3
Classification and Prediction
What Is Classification? What Is Prediction?
Classification and prediction are two forms of data analysis that can be used to extract models
describing important data classes or to predict future data trends. Such analysis can help provide
us with a better understanding of the data at large. Whereas classification predicts categorical
(discrete, unordered) labels,prediction models continuous valued functions.
Example:-
A bank loans officer needs analysis of her data in order to learn which loan applicants are “safe”
and which are “risky” for the bank. Here the data analysis task is classification, where a model or
classifier is constructed to predict categorical labels, such as “safe” or “risky” for the loan
application data.
Suppose that the marketing manager would like to predict how much a given customer will
spend during a sale at AllElectronics. This data analysis task is an example of numeric
prediction, where the model constructed predicts a continuous-valued function,or ordered value,
Each tuple/sample is assumed to belong to a predefined class, as determined by the class label
attribute
The set of tuples used for model construction is training set
The model is represented as classification rules, decision trees, or mathematical formulae
(a) Learning: Training data are analyzed by a classification algorithm. Here, the class label
attribute and the learned model or classifier is represented in the form of classification rules.
(b) Classification: Test data are used to estimate the accuracy of the classification rules. If the
accuracy is considered acceptable, the rules can be applied to the classification of new data
tuples.
The known label of test sample is compared with the classified result from the model
Accuracy rate is the percentage of test set samples that are correctly classified by the model
Test set is independent of training set, otherwise over-fitting will occur
If the accuracy is acceptable, use the model to classify data tuples whose class labels are not
known
Speed: This refers to the computational costs involved in generating and using the given classifier
or predictor.
Robustness: This is the ability of the classifier or predictor to make correct predictions given
noisy data or data with missing values.
Scalability: This refers to the ability to construct the classifier or predictor efficiently given large
amounts of data.
Interpretability: This refers to the level of understanding and insight that is provided by the
classifier or predictor. Interpretability is subjective and therefore more difficult to assess.
means “age <=30” has 5 out of 14 samples, with 2 yes’es and 3 no’s.
THE ATTRIBUTE WITH MAXIMUN GAIN IS MADE THE ROOT WHICH IS AGE IN THIS CASE.
Gain Ratio for Attribute Selection (C4.5)
Information gain measure is biased towards attributes with a large number of values
C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain)
GainRatio(A) = Gain(A)/SplitInfo(A)
The attribute with the maximum gain ratio is selected as the splitting attribute
2) Rule-Based Classification
Using IF-THEN Rules for Classification
Rules are a good way of representing information or bits of knowledge. A rule-based classifier
uses a set of IF-THEN rules for classification. An IF-THEN rule is an expression of the form
coverage(R) =ncovers/D
Accuracy(R) =ncorrect/D
Rule Extraction from a Decision Tree
To extract rules from a decision tree, one rule is created for each path from the root to a leaf
node. Each splitting criterion along a given path is logically ANDed to form the rule antecedent
(“IF” part). The leaf node holds the class prediction, forming the rule consequent (“THEN” part).
3. Initialize the weights. The weights in the network are initialized to small random
number(e.g., ranging from -1.0 to1.0,or -0.5 to 0.5).
4. Propagate the inputs forward
In this step,the net input and output of each unit in the hidden and output layers are computed.
As a simple example, suppose that samples in a given training set are described by two Boolean
attributes,A1 and A2, and that there are two classes,C1 andC2. The rule “IF A1 ANDNOT
A2 THENC2” can be encoded as the bit string “100,” where the two leftmost bits represent
attributes A1 and A2, respectively, and the rightmost bit represents the class. Similarly, the rule
“IF NOT A1 AND NOT A2 THENC1” can be encoded as “001.”
Based on the notion of survival of the fittest, a new population is formed to consist of
the fittest rules in the current population, as well as offspring of these rules.
fitness of a rule is assessed by its classification accuracy on a set of training samples.
Offspring are created by applying genetic operators such as crossover and mutation. In
crossover, substrings from pairs of rules are swapped to form new pairs of rules. In mutation,
randomly selected bits in a rule’s string are inverted.
PREDICTION
What Is Prediction?
Prediction is different from classification because:-
1) Classification refers to predict categorical class label(yes/no)
2) Prediction models continuous-valued functions(salary)
Major method for prediction: Regression
It models the relationship between one or more independent or predictor variables and
a dependent orresponse variable.
Linear Regression
Straight-line regression analysis involves a response variable, y, and a single predictor
variable, x. It is the simplest form of regression, and models y as a linear function of x.
That is, y = b+wx; where the variance of y is assumed to be constant, and b and w are regression
coefficients specifying the Y-intercept and slope of the line, respectively.
2) The confusion matrix is a useful tool for analyzing how well your classifier can recognize tuples
of different classes. Given two classes, we can talk in terms of positive tuples (tuples of the main
class of interest, e.g., buys computer = yes) versus negative tuples (e.g., buys computer = no).
3) True positives refer to the positive tuples that were correctly labeled by the classifier, while true
negatives are the negative tuples that were correctly labeled by the classifier.
4) False positives are the negative tuples that were incorrectly labeled (e.g., tuples of class
buys computer = no for which the classifier predicted buys computer = yes
C1 C2
C1 True positive False negative
2. Cross-validation
In k-fold cross-validation, the initial data are randomly partitioned into k mutually exclusive
subsets or “folds,” D1, D2, : : : , Dk, each of approximately equal size. Training and testing is
performed ktimes. In iteration i, partition Di is reserved as the test set, and the remaining
partitions are collectively used to train the model. That is, in the first iteration, subsets D2, : : :
, Dk collectively serve as the training set in order to obtain a first model, which is tested on D1;
the second iteration is trained on subsets D1, D3, : : : , Dk and tested on D2; and so on.
3. Bootstrap
Unlike the accuracy estimation methods mentioned above, the bootstrap method samples the
given training tuples uniformly with replacement. That is, each time a tuple is selected, it is
equally likely to be selected again and re added to the training set. For instance, imagine a
machine that randomly selects tuples for our training set. In sampling with replacement, the
machine is allowed to select the same tuple more than once.
There are several bootstrap methods. A commonly used one is the .632 bootstrap, which
works as follows. Suppose we are given a data set of d tuples. The data set
is sampled d times, with replacement, resulting in a bootstrap sample or training set of d samples.
It is very likely that some of the original data tuples will occur more than once in this sample.
The data tuples that did not make it into the training set end up forming the test set