DADM - Cheat Sheet: Hypothesis Testing Two Way Anova

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

DADM – Cheat Sheet

Two Way Anova


Hypothesis Testing Two way ANOVA.
Variable: One
Hypothesis: Assumption about the [population
2 Factor: Categorical Variable (treatments)
Parameter (Mean, Median, Mode, Variance, SD).
Eg: - Variable: Marks.
Ho & Ha – Null and Alternate
Ho: No Effect (Status Quo) 1 Factor (Sections): A,B,C,D,E
Ha: Effect (Prove/Disprove) 2 Factors (States) : MH, TN, TG,
Three Approaches: Critical Value, p-Value and Conf. B/w Column Variance and Within Column Variance.
Interval. Sum of squares, DF, Mean Sum of Squares.
Hypothesis Testing Process F- Distribution = MSTR/MSE
Interaction Effect.
DCD approach – Define, Calculate, Decide.
PLAN ®.
Chi-Square Test – Non parametric
Type 1 & Type 2 errors
Data Type: Nominal/Count/Frequency.
Type 1: Reject Ho when it is TRUE.
Most widely test.
HT  Type 1 Error Significance Test.
Goodness of Fit.
Type 2 error  Wording - Accept Ho  Fail to
Ho: The data follows a particular distribution.
Reject.
Observed Data (Sample) Vs Expected data.
ONE SAMPLE TEST
Data Type: Nominal/Count/Frequency.
Population Parameter : Mean No. of Variables: TWO categorical
VARIABLE: ONE Test of Independence.
Data: Numeric (Interval/Ratio) Ho: Two variables are Independent.
Z- TEST  Popln SD is known Observed Data (Sample) Vs Expected data.
T-test - Popln SD is Unknown.
Right Tail, Left Tail Test and Two Tail Test. Spearman Rank Correlation – Ordinal
Two Sample Test Pearson Correlation – Numeric
Value: [-1 to + 1].
Population Parameter: Means
Positive: 0.5 and below  Low
VARIABLE: ONE
Positive: 0.5 and 0.7  Moderate
Data: Numeric (Interval/Ratio)
Positive: above 0.7 /0.9  High
Population SD.
ZERO  NO Correlation
Z- TEST  Popln SD is known
Negative: 0.5 and below  Low
T-test - Popln SD is Unknown.
Negative: 0.5 and 0.7  Moderate
T-test: Independent sample and Paired t-test.
Negative: above 0.7 /0.9  High
Independent Sample. (Sample size can be different.
Two different populations. Simple Liner regression
Paired/Dependent/Matched  Sample is equal. Dependent and Independent Variable (DV & IV).
(Before and After) Simple – 1 DV and 1 IV
Linear relationship  DV and IV
More than 2 samples – 1 way ANOVA
Regression  Past data
Variable: One
Objective: To find the best fit line. (Linear). y-hat =
1 Factor: Categorical Variable (treatments)
bo+b1.x1
Eg: - Variable: Marks. Sections: A,B,C,D,E
Best fit Line?  estimates(b0, b1)  BLUE.
Hypothesis – Ho & Ha
Criteria: Minimizing the error sum of Squares 
Ha: Not all means are equal.
OLS method.
B/w Column Variance and Within Column Variance.
Sum of squares, DF, Mean Sum of Squares.
F- Distribution = MSTR/MSE
Model Diagnostics – SLR Logistic Regression:
Variance Explained  R-Square. Dependent Variable: Categorical
Overall Model Fit  F-Test (ANOVA) Categorical  2 levels/Class  Binary LR
Individual variable Significance: T-test. Categorical  > 2 levels/Class  Multinomial LR.
Error Term: Residual Analysis – Assumption about
The Error Term We cannot use  linear line  more
Error terms are independent, normally misclassification - Categorical nature of DV.
distributed, mean zero and constant CRITERIA: MLE (Maximum Likelihood Estimation).
variance.
Visual Representation. Sigmoid Function
Histogram, P-p plot – Normal Distribution • [0,1]  categorical variable (o,1)
Homoscedasticity – Horizontal Band. • Value of Z – [ -inf to + inf ]
Multiple Liner Regression Output of LR  Probability of Success Class.
Dependent and Independent Variable (DV & IV). Cut –off (researcher)  Group Membership.
Mulitple – 1 DV and More than One IV LOGIT Function: LINK Function
Linear relationship  DV and IV • Interpreting the relationship between
Regression  Past data IV and DV.
Objective: To find the best fit line. (Linear). Y-hat = • Sigmoid Function: LHS = Ln(Odds)
bo+b1.x1+b2.x2 • RHS = Linear Function.
Best fit Line?  estimates (b0, b1,b2)  BLUE. • Ln (odds) = logit(p)= b0+b1.x1+b2.x2.
Criteria: Minimizing the error sum of Squares  • Odds = Exp (x1)
OLS method. • Odds > 1  probability increases for
that case towards success class.
Variance Explained  R-Square. • Odds < 1  probability decreases for
Adjusted R-Square: penalizing for independent that case towards success class.
variables (p)  Model Parsimony  Model with • Odds = 1  probability remains
optimum no. of variables  Maximum Variance. unchanged.
Overall Model Fit  F-Test (ANOVA)
Individual variable Significance: T-test.
Logistic Regression - Model Diagnostics
Error Term: Residual Analysis – Assumption about
the error Term. Variance Explained: Pseudo R-Square
Error terms are independent, normally Cox-Snell (Theoretical Value of 1 – Cannot Reach)
distributed, mean zero and constant Nagelkarke --- Theoretically it can reach 1.
variance.
Visual Representation. Overall Model Fit (Similar to F-Test in MLR).
Histogram, P-p plot – Normal Distribution Omnibus Test
Homoscedasticity – Horizontal Band. Hosmer and Lemeshow
Multi Collinearity: When independent variables has
correlation among one another. Individual Variables Significance
Find: Variation Inflation Factor (VIF) – X1 Wald Statistic (Similar to t-test in MLR
and X2.
Criteria: VIF > 4 --. Presence of Multi Classification/Confusion Matrix:
collinearity. Accuracy Calculation.
Auto Correlation: Time series data
Find: Durbin Watson Test.

You might also like