Formula Sheet - CSE - 381 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

(𝑥−𝜇)2

1 −
1. Normal Distribution 𝑓(𝑥) = 𝑒 2𝜎2
√2𝜋𝜎 2

1 1 ∑(𝑥𝑖 −𝜇)2
Mean (µ) = ∑𝑛𝑖=1 𝑥𝑖 Variance (𝜎 2 ) = ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 Standard deviation (𝜎) = √
𝑛 𝑛 𝑛
(𝑥−𝜇)𝑇 Σ−1 (𝑥−𝜇)
1 −
2. Bivariate Random Variable: Probability density function: 𝑓(𝑥|𝜇, Σ) = 𝑒 2
2𝜋√|Σ|
𝜎12 𝜎12
2D mean (𝜇) = (𝜇1 , 𝜇2 )𝑇 Covariance (Σ) = ( )
𝜎21 𝜎22
1 1
𝜇𝑖 = ∑𝑛𝑘=1 𝑥𝑘𝑖 𝜎𝑖𝑗 = ∑𝑛𝑘=1(𝑥𝑘𝑖 − 𝜇𝑖 ) (𝑥𝑘𝑗 − 𝜇𝑖𝑗 ) 𝜎𝑖2 = 𝜎𝑖𝑖
𝑛 𝑛
Simple Linear Regression
𝑦̂ = 𝜃 𝑇 𝑋 Least squares parameters
Cost function 𝛽0 = 𝑦̅ − 𝛽1 𝑥̅
𝜕 Σ (𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) 𝑆𝑥𝑦
𝜃𝑗 ≔ 𝜃𝑗 − 𝜂 𝐽(𝜃) 𝛽1 = =
𝜕𝜃𝑗 Σ (𝑥𝑖 − 𝑥̅ )2 𝑆𝑥 2
2 Errors
𝜃 = 𝜃 − 𝜂 𝑋 𝑇 (𝑋𝜃 − 𝑦)
𝑚 𝑛
Sample mean variances 1 2
𝑹𝑺𝑬(𝜀) = √ ∑(𝑦𝑖 − 𝑦̂)
𝑖
𝑛 𝑛−2
1 𝑖=1
𝑆𝑥 2 = 𝑆𝑥𝑥 = ∑(𝑥𝑖 − 𝑥̅ )2 𝑛
𝑛 1
𝑖=1 2
𝑛
𝑴𝑺𝑬 = ∑(𝑦𝑖 − 𝑦̂)
𝑖
𝑛
1 𝑖=1
𝑆𝑦 2 = ∑(𝑦𝑖 − 𝑦̅)2 𝑛
𝑛 2
𝑖=1 𝑹𝑺𝑺(𝑑) = ∑(𝑦𝑖 − 𝑦̂)
𝑖
Covariance 𝑖=1
𝑛
1
𝑆𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
𝑛
𝑖=1
Bayes Optimality
(𝑦 − 𝑡)2 Feature normalization
𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 𝑙𝑜𝑠𝑠 = 𝐿(𝑦, 𝑡) = 𝑚𝑖𝑛
𝑓𝑜𝑙𝑑 −𝑓𝑜𝑙𝑑
2 𝑓𝑛𝑒𝑤 = 𝑟𝑎𝑛𝑔𝑒 [0,1]
2] 𝑚𝑎𝑥 −𝑓 𝑚𝑖𝑛
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑙𝑜𝑠𝑠 = 𝐸[(𝑦 − 𝑡) 𝑓𝑜𝑙𝑑 𝑜𝑙𝑑
= (𝑦∗ − 𝐸[𝑦])2 + 𝑉𝑎𝑟(𝑦) 𝑓𝑜𝑙𝑑 − 𝜇
𝑓𝑛𝑒𝑤 = 𝑚𝑒𝑎𝑛 = 0, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 1
+ 𝑉𝑎𝑟(𝑡) 𝜎
= 𝑏𝑖𝑎𝑠 + 𝑣𝑎𝑟𝑖𝑎𝑛𝑣𝑒 Feature weighting
+ 𝐵𝑎𝑦𝑒𝑠 𝑒𝑟𝑟𝑜𝑟
𝐷(𝑎, 𝑏 ) = √∑ 𝑤𝑘 (𝑎𝑘 − 𝑏𝑘 )2
One standard error
𝑘
𝜎 [𝑝]
𝑆𝐸[𝑝] =
√𝑘 − 1
Bayes Decision
Probability of error: 𝑃(𝐸) = min(𝑃(𝜔1 ), 𝑃(𝜔2 ))
2

𝑝(𝑥) = ∑ 𝑝(𝑥|𝜔𝑗 )𝑝(𝜔𝑗 ) 𝑐𝑎𝑠𝑒 𝑜𝑓 2 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠


𝑗=1
𝑝(𝜔𝑗 , 𝑥) = 𝑝(𝜔𝑗 |𝑥)𝑝(𝑥) = 𝑝(𝑥|𝜔𝑗 )𝑝(𝜔𝑗 )

𝑇𝑃
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑹𝒆𝒄𝒂𝒍𝒍 =
𝑇𝑃 + 𝐹𝑁
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑭𝟏 = 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

1
𝐹𝑃 Naïve Bayes Gaussian Model
𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒 =
𝐹𝑃 + 𝑇𝑁 1 1
Type 1 error: False Positives 𝑝(𝑥1 ) = exp {− 2 (𝑥1 − 𝜇1 )2 }
𝑍1 2𝜎1
Type 2 error: False Negatives 1 1
Sensitivity: True Positive Rate = P(𝑌̂ = 1 | Y = 1) 𝑝(𝑥2 ) = exp {− 2 (𝑥2 − 𝜇2 )2 }
𝑍2 2𝜎2
Specificity: True Negative Rate P(𝑌̂ = 0 | Y = 0) 1 1 𝑇
Sigmoid/ Logistic function 𝑝(𝑥1 )𝑝(𝑥2 ) = exp {− (𝑥 − 𝜇) Σ −1 (𝑥 − 𝜇)}
𝑍1 𝑍2 2
1
𝑝̂ = 𝑔(𝜃 𝑇 𝑋) = 𝑇 𝜇 = [𝜇1 𝜇2 ] Σ = 𝑑𝑖𝑎𝑔(𝜎12 , 𝜎22 )
1 + 𝑒 −𝜃 𝑥
Cost Function (log-loss) 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑒𝑛𝑡 𝑑(𝑥, 𝑦)
∑𝑖(𝑥 𝑖 − 𝑥̅ )(𝑦 𝑖 − 𝑦̅)
=
√∑𝑖(𝑥 𝑖 − 𝑥̅ )2 (𝑦 𝑖 − 𝑦̅)2
𝑘 𝑚
Probability for each class 2
𝐽 = 𝑎𝑟𝑔 min ∑ ∑‖𝑥 𝑖 − 𝐶𝑗 ‖
𝐶𝑗
𝑗=1 𝑖=1
Similarity Function
Log likelihood 𝜑𝛾(𝑥, ℓ) = exp(−𝛾 ‖𝑥 − ℓ‖2 )
𝑚

𝑙(𝜃) = ∑ 𝑦 (𝑖) (log ℎ𝜃 (𝑥 (𝑖) ))


𝑖=1
+ (1 − 𝑦 (𝑖) )(log 1 − ℎ𝜃 (𝑥 (𝑖) ))

Gradient Descent

Hamming Distance
Multinomial Logistic Regression 𝑛
1
ℎ𝜃 (𝑥) = 𝑘 𝐷𝐻𝑎𝑚𝑚𝑖𝑛𝑔 (𝑎, 𝑏) = ∑|𝑎𝑖 − 𝑏𝑖 |
∑𝑗=1 exp(𝜃 (𝑗)𝑇 𝑥) 𝑖=1
Hinge Loss for Data

‖𝑢‖ = √𝑢12 + 𝑢22 ∈ 𝑅


Information Gain

Entropy and Gini

Choose ꙍ1 if

Expected Loss

You might also like