Problem 1: Cse352 AI Homework 3 Solutions
Problem 1: Cse352 AI Homework 3 Solutions
Problem 1: Cse352 AI Homework 3 Solutions
Homework 3
SOLUTIONS
PROBLEM 1
Use Lecture Notes to WRITE short, 1-2 paragraphs long ANSWERS
to the following questions.
Remember: on TEST your answers must match Lectures not
RANDOM Google pieces as a lot of students copied in. This is zero
points.
1) The first step is the selection of data. The selected data is a subset
of all the available data and must address the problem at hand.
2) The next step is to clean the selected data so that the impact of
missing, incomplete, or noisy data is minimized.
3) Next the data must continue to get preprocessed so that learning
algorithms can be applied to it.
4) Now a learning algorithm can be applied to the preprocessed data to
obtain patterns about the data.
5) In the final steps, the patterns are tested for their accuracy and
interpreted so that the user can decide if they still need refining or
can be presented.
7. Define a CLASSIFIER
For the data set given below build a classifier following all steps
needed in the constructions:
preprocessing, training and testing
Describe and motivate your choice of algorithms and methods used at
each step.
CLASSIFICATION DATA:
Age Income Student Credit Rating Buys Computer
21 60,000 yes 3 No
30 70,000 No 5 No
38 38666.667 No 2 Yes
45 45,000 yes 3 Yes
46 25,000 no 2 Yes
47 30,000 Yes 6 No
39 28,000 Yes 5 No
29 48,000 Yes 3 No
50 75,000 Yes 2 No
48 55125 Yes 3 No
30 38666.667 Yes 6 Yes
51 46,000 No 4 Yes
32 80,000 Yes 2 No
45 50,000 No 4 No
PART 1: Preprocessing
Attributes: Age, Income, Credit Rating
1. Preprocessing Calculations
THIS is a Solution Submitted by a Student
There are many other solutions!
1. Mising Values: explain the method you used
No Class
(60000+70000+30000+28000+48000+75000+80000+50000)/8=
441000/8=55125 Mean
Yes Class
(45000+25000+46000)/3=
116000/3=38666.667 Mean
2. Binninig: AGE
N intervals
N=3
Depth=5
Bin 1 21 29 30 30 32
Bin 2 38 39 45 45 46
Bin 3 47 48 50 51
3. Binning: INCOME
N intervals
N=3
Depth=5
Bin 1: >=25000&<=38666.667
Bin 2: >38666.667&<=55125
Bin 3: >55125&<=80000
4. Binning : CREDIT RATING
N intervals
N=2
Highest Value=6
Lowest Value=2
(6+2)/3=8/2=4
Bin 1: >=0&<4
Bin 2: >=4&<=8
TRAINING- TESTING
ID3 non heuristic attributes chosen
21 60,000 yes 3 No
30 70,000 No 5 No
38 38,666.667 No 2 Yes
45 45,000 yes 3 Yes
46 25,000 No 2 Yes
Set 2:
47 30,000 Yes 6 No
39 28,000 Yes 5 No
29 48,000 Yes 3 No
51 46,000 No 4 Yes
32 80,000 Yes 2 No
Set 3:
50 75,000 Yes 2 No
48 55125 Yes 3 No
30 38666.667 Yes 6 Yes
45 50,000 No 4 No
Training 1
Age
Age
Class = No
Credit Rating
>=0&<4 >=4&<=8
Age
Class = No
Credit Rating
>=0&<4 >=4&<=8
Class = Yes
Class = No
No Student
Class = No
Testing Set 3
50 75,000 Yes 2 No
48 55125 Yes 3 No
30 38666.667 Yes 6 Yes
45 50,000 No 4 No
Training 2
47 30,000 Yes 6 No
39 28,000 Yes 5 No
29 48,000 Yes 3 No
51 46,000 No 4 Yes
32 80,000 Yes 2 No
50 75,000 Yes 2 No
48 55125 Yes 3 No
30 38666.667 Yes 6 Yes
45 50,000 No 4 No
Credit Rating
>=0&<4 >=4&<=8
Age Income Student Buys Age Income Student Buys
Computer Computer
29 48,000 Yes No 47 30,000 Yes No
32 80,000 Yes No 39 28,000 Yes No
50 75,000 Yes No 51 46,000 No Yes
48 55125 Yes No 30 38666.667 Yes Yes
45 50,000 No No
Credit Rating
>=0&<4 >=4&<=8
Class = No
Age
21>=&<=32 32>&<=46 46>&<=51
Age Student Buys Income Student Buys
Computer Income Student Buys
Computer
30 Yes Yes Computer
28,000 Yes No
30,000 Yes No
50,000 No No
46,000 No Yes
Credit Rating
>=0&<4 >=4&<=8
Class = No
Age
21>=&<=32 32>&<=46 46>&<=51
Class = Yes Class = No
Student
No
Yes
Credit Rating
>=0&<4 >=4&<=8
Class = No
Age
21>=&<=32 32>&<=46 46>&<=51
Class = Yes Class = No
Student
No Yes
Class = Yes
Class = No
Rules
21 60,000 Yes 3 No
30 70,000 No 5 No
38 38,666.667 No 2 Yes
45 45,000 Yes 3 Yes
46 25,000 No 2 Yes
1/5*100%=20%=predictive accuracy
Training 3
21 60,000 Yes 3 No
30 70,000 No 5 No
38 38,666.667 No 2 Yes
45 45,000 Yes 3 Yes
46 25,000 No 2 Yes
50 75,000 Yes 2 No
48 55125 Yes 3 No
30 38666.667 Yes 6 Yes
45 50,000 No 4 No
Income
>=25000&<=38666.667 >55125&<=80000
>38666.667&<=55125
Age Student Credit Buys Age Student Credit Buys Age Student Credit Buys
Rating Computer Rating Computer Rating Computer
38 No 2 Yes 45 Yes 3 Yes 21 Yes 3 No
46 No 2 Yes 48 Yes 3 No 30 No 5 No
30 Yes 6 Yes 45 No 4 No 50 Yes 2 No
Income
>=25000&<=38666.667 >55125&<=80000
>38666.667&<=55125
Income
>=25000&<=38666.667 >55125&<=80000
>38666.667&<=55125
Class = No Class = No
Student
No Yes
Credit Buys
Rating Computer
Credit Buys
4 No Rating Computer
3 Yes
Income
>=25000&<=38666.667 >55125&<=80000
>38666.667&<=55125
Class = No Class = No
Student
No Yes
Class = No
Class = Yes
Training Set 2
47 30,000 Yes 6 No
39 28,000 Yes 5 No
29 48,000 Yes 3 No
51 46,000 No 4 Yes
32 80,000 Yes 2 No
Record 1 is misclassified
Record 2 is misclassified
Record 3 is misclassified
Record 4 is misclassified
Record 5 is misclassified
0/5*100%=0%=predictive accuracy
4. MY CLASSIFIER
Predictive Accuracy = Training set one predictive accuracy+ Training set two
predictive accuracy+ Training set three predictive accuracy)/3
(75%+20%+0%)/3=95%/3=31.667% accuracy
a1 a2 a3 Class
0.5 0 0.2 1
0 0.3 0 1
FIRST EPOCH:
FirstrowofDATA:
0.5 0 0.2 0.1 0.2 0.3 0.4 0.1 0.1 0.5 0.2 0.3 0.2 0.1
Error measurement:
Unitj Errorj
6 0.625(1-0.625)(1-0.625)=0.0878
We assume T6=1
5 0.579(1-0.579)(0.0878)(0.2)=0.0043
4 0.591(1-0.591)(0.0878)(0.5)=0.0153
New adjusted values:
Weight New Values
w46 0.5 + .7(0.0878)(.591) =0.536
θ4 0.3 + .7(0.0153)=0.311
θ5 0.2+ .7(0.0043)=0.203
6 0.1+ .7(0.0878)=0.161
SecondRowofDATA:
0 0.3 0 0.105 0.202 0.3 0.4 0.102 0.102 0.536 0.236 0.311 0.203 0.161
Error measurement:
Unitj Errorj
6 0.65(1-0.65)(1-0.65)=0.0796
We assume T6=1
5 0.58(1-0.58)(0.0796)(0.236)=0.0046
4 0.598(1-0.598)(0.0796)(0.536)=0.0103
New adjusted values:
Weight New Values
w46 0.536 + .7(0.0796)(.598) =0.569
4 0.311 + .7(0.0103)=0.318
5 0.203 + .7(0.0046)=0.206
6 0.161 + .7(0.0796)=0.217
SECOND EPOCH:
FirstrowofDATA:
0.5 0 0. 0.105 0.202 0.302 0.40 0.102 0.102 0.569 0.268 0.318 0.206 0.217
2 1
4 0.597(1-0.597)(0.079)(0.569)=0.0108
θ4 0.318 + .7(0.0108)=0.326
θ5 0.206 + .7(0.0052)=0.210
θ6 0.217 + .7(0.079)=0.272
SecondRowofDATA:
0 0.3 0 0.019 0.204 0.302 0.401 0.104 0.103 0.602 0.3 0.326 0.210 0.262
Unitj Net Input Ij Output Oj
4 (0)+(0.3)(0.302)+(0)+0.326= 0.417 1/(1+e-0.417) = 0.603
Error measurement:
Unitj Errorj
6 0.69(1-0.69)(1-0.69)=0.066
We assume T6=1
5 0.582(1-0.582)(0.066)(0.3)=0.0048
4 0.69(1-0.69)(0.066)(0.602)=0.0085
θ4 0.326 + .7(0.0085)=0.332
θ5 0.210 + .7(0.0048)=0.213
θ6 0.262 + .7(0.066)=0.308
PROBLEM 4: Classification by Association
I1 2
I2 2
I3 2
I4 4
I5 2
I6 4
I7
2
C1
=
L1
{I1, I2} 0
{I1, I3} 0
{I1, I4} 1
{I1,
I5}
1
{I1,
I6}
1
{I1, I7} 1
{I2, I3} 0
{I2, I4} 2
{I2, I5} 0
{I2, I6} 1
{I2, I7} 1
{I3, I4} 1
{I3, I5} 1
{I3, I6} 2
{I3, I7} 0
{I4, I5} 0
{I4, I6} 2
{I4, I7} 2
{I5, I6} 2
{I5, I7} 0
{I6,
I7}
0
C2
{I2, I4} 2
{I3, I6} 2
{I4,
I6}
2
29
{I4, I7} 2
{I5,
I6}
2
L2
FOR
Classification
by
Association
we
only
Consider
I={IX,IY}
Where
IY
represents
CLASS
ATTRIBUTES:
Rating
=
Fair
or
Rating
=
Excellent
{I3, I6} 2
{I4, I6} 2
{I4, I7} 2
{I5,
I6}
2
30
TEST data