Find S Algorithm
Find S Algorithm
Find S Algorithm
Find S algorithm
Introduction :
The find-S algorithm is a basic concept learning algorithm in machine learning. The
find-S algorithm finds the most specific hypothesis that fits all the positive examples.
We have to note here that the algorithm considers only those positive training
example. The find-S algorithm starts with the most specific hypothesis and
generalizes this hypothesis each time it fails to classify an observed positive training
data. Hence, the Find-S algorithm moves from the most specific hypothesis to the
most general hypothesis.
Important Representation :
1. ? indicates that any value is acceptable for the attribute.
2. specify a single required value ( e.g., Cold ) for the attribute.
3. ϕindicates that no value is acceptable.
4. The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}
5. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Steps Involved In Find-S :
1. Start with the most specific hypothesis.
2. h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
3. Take the next example and if it is negative, then no changes occur to the hypothesis.
4. If the example is positive and we find that our initial hypothesis is too specific then
we update our current hypothesis to a general condition.
5. Keep repeating the above steps till all the training examples are complete.
6. After we have completed all the training examples we will have the final hypothesis
when can use to classify the new examples.
Example :
Consider the following data set having the data about which particular seeds are
poisonous.
1
[Type the document title]
Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this example
and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this example
and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 4 :
2
[Type the document title]
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare every
single attribute with the initial data and if any mismatch is found we replace that
particular attribute with a general case ( ” ? ” ). After doing the process the hypothesis
becomes :
h = { ?, HARD, NO, WRINKLED }
Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare every
single attribute with the initial data and if any mismatch is found we replace that
particular attribute with a general case ( ” ? ” ). After doing the process the hypothesis
becomes :
h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have the
general condition, example 6 and example 7 would result in the same hypothesizes with
all general attributes.
h = { ?, ?, ?, ? }
3
[Type the document title]
The candidate elimination algorithm incrementally builds the version space given
a hypothesis space H and a set E of examples. The examples are added one by
one; each example possibly shrinks the version space by removing the hypotheses
that are inconsistent with the example. The candidate elimination algorithm does
this by updating the general and specific boundary for each new example.
You can consider this as an extended form of the Find-S algorithm.
Consider both positive and negative examples.
Actually, positive examples are used here as the Find-S algorithm (Basically
they are generalizing from the specification).
While the negative example is specified in the generalizing form.
Terms Used:
Concept learning: Concept learning is basically the learning task of the
machine (Learn by Train data)
General Hypothesis: Not Specifying features to learn the machine.
G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
Specific Hypothesis: Specifying features to learn machine (Specific feature)
S= {‘pi’,’pi’,’pi’…}: The number of pi depends on a number of attributes.
Version Space: It is an intermediate of general hypothesis and Specific
hypothesis. It not only just writes one hypothesis but a set of all possible
hypotheses based on training data-set.
Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
4
[Type the document title]
Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?,
?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?,
?, ?]]
S = [Null, Null, Null, Null, Null, Null]
5
[Type the document title]
Output :
G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
S = ['sunny','warm',?,'strong', ?, ?]
The Candidate Elimination Algorithm (CEA) is an improvement over the Find-S
algorithm for classification tasks. While CEA shares some similarities with Find-S,
it also has some essential differences that offer advantages and disadvantages.
Here are some advantages and disadvantages of CEA in comparison with Find-S:
Advantages of CEA over Find-S:
Improved accuracy: CEA considers both positive and negative examples to
generate the hypothesis, which can result in higher accuracy when dealing with
noisy or incomplete data.
Flexibility: CEA can handle more complex classification tasks, such as those with
multiple classes or non-linear decision boundaries.
More efficient: CEA reduces the number of hypotheses by generating a set of
general hypotheses and then eliminating them one by one. This can result in
faster processing and improved efficiency.
Better handling of continuous attributes: CEA can handle continuous attributes
by creating boundaries for each attribute, which makes it more suitable for a
wider range of datasets.
Disadvantages of CEA in comparison with Find-S:
More complex: CEA is a more complex algorithm than Find-S, which may make it
more difficult for beginners or those without a strong background in machine
learning to use and understand.
Higher memory requirements: CEA requires more memory to store the set of
hypotheses and boundaries, which may make it less suitable for memory-
constrained environments.
Slower processing for large datasets: CEA may become slower for larger datasets
due to the increased number of hypotheses generated.
Higher potential for overfitting: The increased complexity of CEA may make it
more prone to overfitting on the training data, especially if the dataset is small
or has a high degree of noise.
6
[Type the document title]