Ans in Day 4 Slides
Ans in Day 4 Slides
Ans in Day 4 Slides
See Excel file bootstrap5ans.xlsx for one possible answer to Q1 to Q3. You
are encouraged to create your own answers. Due to randomisation, your
answer will be different.
4. No. The cases not selected into the bootstrap sample can serve as
testset when the bootstrap sample is used as the trainset.
1
Results of Random Forest on Heart data
• Q: Why did the confusion matrix contain only 297 cases when the dataset
has 303 cases?
• Ans: 6 cases has missing values and were omitted as na.action= na.omit
Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2. 2
To check how many times each case is OOB
among the 500 trees, m.RF.1$oob.times
P(case i is OOB) = (1 – 1/n)n = (1 – 1/297)297 ≈ 0.367
Checking:
Case 1: P(OOB) = 177/500 = 0.354
Case 2: P(OOB) = 185/500 = 0.37
Case 3: P(OOB) = 193/500 = 0.386
…
Interpretation:
case 1 is not inside 177 of the 500
trees in the forest, …
Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2. 3
View RF vote for each case in the dataset via
m.RF.1$votes
Q: Consider case 1. Does this mean 56% of
the 500 trees voted AHD = No and 44% of
the 500 trees voted AHD = Yes?
Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2 & randomForest Documentation. 4
Caution: m.RF.1$err.rate is not the error rate at each
tree
Source: Chew C.H. (2024) A.I., Analytics & Data Science, Vol. 2 & randomForest Documentation. 5