Ans in Day 4 Slides

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 5

Ans in Ex 4.

See Excel file bootstrap5ans.xlsx for one possible answer to Q1 to Q3. You
are encouraged to create your own answers. Due to randomisation, your
answer will be different.

3. P(Case i is not selected into a bootstrap sample) =

4. No. The cases not selected into the bootstrap sample can serve as
testset when the bootstrap sample is used as the trainset.

1
Results of Random Forest on Heart data

• B = 500, RSF size = 3; OOB overall error = (24 + 30)/297 = 18.18%.

• Q: How are the confusion matrix results determined?


• Ans: From OOB data and majority rule.

• Q: Why did the confusion matrix contain only 297 cases when the dataset
has 303 cases?
• Ans: 6 cases has missing values and were omitted as na.action= na.omit

Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2. 2
To check how many times each case is OOB
among the 500 trees, m.RF.1$oob.times
P(case i is OOB) = (1 – 1/n)n = (1 – 1/297)297 ≈ 0.367

Checking:
Case 1: P(OOB) = 177/500 = 0.354
Case 2: P(OOB) = 185/500 = 0.37
Case 3: P(OOB) = 193/500 = 0.386

Interpretation:
case 1 is not inside 177 of the 500
trees in the forest, …

Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2. 3
View RF vote for each case in the dataset via
m.RF.1$votes
Q: Consider case 1. Does this mean 56% of
the 500 trees voted AHD = No and 44% of
the 500 trees voted AHD = Yes?

Ans: No. Not 500 trees. Only in those trees


(approx. 1/3 of 500) for which case 1 is OOB.

Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2 & randomForest Documentation. 4
Caution: m.RF.1$err.rate is not the error rate at each
tree

Q: Consider row 4. What is the


meaning of OOB = 0.27? Does this
mean the 4th tree OOB error is 27%?

Ans: OOB error using the first 4 trees


is 27%

Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2 & randomForest Documentation. 5

You might also like