DADM - Cheat Sheet: Hypothesis Testing Two Way Anova

DADM – Cheat Sheet
Two Way Anova

Hypothesis Testing Two way ANOVA.
Variable: One
Hypothesis: Assumption about the [population
2 Factor: Categorical Variable (treatments)
Parameter (Mean, Median, Mode, Variance, SD).
Eg: - Variable: Marks.
Ho & Ha – Null and Alternate
Ho: No Effect (Status Quo) 1 Factor (Sections): A,B,C,D,E
Ha: Effect (Prove/Disprove) 2 Factors (States) : MH, TN, TG,
Three Approaches: Critical Value, p-Value and Conf. B/w Column Variance and Within Column Variance.
Interval. Sum of squares, DF, Mean Sum of Squares.
Hypothesis Testing Process F- Distribution = MSTR/MSE
Interaction Effect.
DCD approach – Define, Calculate, Decide.
PLAN ®.
Chi-Square Test – Non parametric
Type 1 & Type 2 errors
Data Type: Nominal/Count/Frequency.
Type 1: Reject Ho when it is TRUE.
Most widely test.
HT  Type 1 Error Significance Test.
Goodness of Fit.
Type 2 error  Wording - Accept Ho  Fail to
Ho: The data follows a particular distribution.
Reject.
Observed Data (Sample) Vs Expected data.
ONE SAMPLE TEST
Data Type: Nominal/Count/Frequency.
Population Parameter : Mean No. of Variables: TWO categorical
VARIABLE: ONE Test of Independence.
Data: Numeric (Interval/Ratio) Ho: Two variables are Independent.
Z- TEST  Popln SD is known Observed Data (Sample) Vs Expected data.
T-test - Popln SD is Unknown.
Right Tail, Left Tail Test and Two Tail Test. Spearman Rank Correlation – Ordinal
Two Sample Test Pearson Correlation – Numeric
Value: [-1 to + 1].
Population Parameter: Means
Positive: 0.5 and below  Low
VARIABLE: ONE
Positive: 0.5 and 0.7  Moderate
Data: Numeric (Interval/Ratio)
Positive: above 0.7 /0.9  High
Population SD.
ZERO  NO Correlation
Z- TEST  Popln SD is known
Negative: 0.5 and below  Low
T-test - Popln SD is Unknown.
Negative: 0.5 and 0.7  Moderate
T-test: Independent sample and Paired t-test.
Negative: above 0.7 /0.9  High
Independent Sample. (Sample size can be different.
Two different populations. Simple Liner regression
Paired/Dependent/Matched  Sample is equal. Dependent and Independent Variable (DV & IV).
(Before and After) Simple – 1 DV and 1 IV
Linear relationship  DV and IV
More than 2 samples – 1 way ANOVA
Regression  Past data
Variable: One
Objective: To find the best fit line. (Linear). y-hat =
1 Factor: Categorical Variable (treatments)
bo+b1.x1
Eg: - Variable: Marks. Sections: A,B,C,D,E
Best fit Line?  estimates(b0, b1)  BLUE.
Hypothesis – Ho & Ha
Criteria: Minimizing the error sum of Squares 
Ha: Not all means are equal.
OLS method.
B/w Column Variance and Within Column Variance.
Sum of squares, DF, Mean Sum of Squares.
F- Distribution = MSTR/MSE
Model Diagnostics – SLR Logistic Regression:
Variance Explained  R-Square. Dependent Variable: Categorical
Overall Model Fit  F-Test (ANOVA) Categorical  2 levels/Class  Binary LR
Individual variable Significance: T-test. Categorical  > 2 levels/Class  Multinomial LR.
Error Term: Residual Analysis – Assumption about
The Error Term We cannot use  linear line  more
Error terms are independent, normally misclassification - Categorical nature of DV.
distributed, mean zero and constant CRITERIA: MLE (Maximum Likelihood Estimation).
variance.
Visual Representation. Sigmoid Function
Histogram, P-p plot – Normal Distribution • [0,1]  categorical variable (o,1)
Homoscedasticity – Horizontal Band. • Value of Z – [ -inf to + inf ]
Multiple Liner Regression Output of LR  Probability of Success Class.
Dependent and Independent Variable (DV & IV). Cut –off (researcher)  Group Membership.
Mulitple – 1 DV and More than One IV LOGIT Function: LINK Function
Linear relationship  DV and IV • Interpreting the relationship between
Regression  Past data IV and DV.
Objective: To find the best fit line. (Linear). Y-hat = • Sigmoid Function: LHS = Ln(Odds)
bo+b1.x1+b2.x2 • RHS = Linear Function.
Best fit Line?  estimates (b0, b1,b2)  BLUE. • Ln (odds) = logit(p)= b0+b1.x1+b2.x2.
Criteria: Minimizing the error sum of Squares  • Odds = Exp (x1)
OLS method. • Odds > 1  probability increases for
that case towards success class.
Variance Explained  R-Square. • Odds < 1  probability decreases for
Adjusted R-Square: penalizing for independent that case towards success class.
variables (p)  Model Parsimony  Model with • Odds = 1  probability remains
optimum no. of variables  Maximum Variance. unchanged.
Overall Model Fit  F-Test (ANOVA)
Individual variable Significance: T-test.
Logistic Regression - Model Diagnostics
Error Term: Residual Analysis – Assumption about
the error Term. Variance Explained: Pseudo R-Square
Error terms are independent, normally Cox-Snell (Theoretical Value of 1 – Cannot Reach)
distributed, mean zero and constant Nagelkarke --- Theoretically it can reach 1.
variance.
Visual Representation. Overall Model Fit (Similar to F-Test in MLR).
Histogram, P-p plot – Normal Distribution Omnibus Test
Homoscedasticity – Horizontal Band. Hosmer and Lemeshow
Multi Collinearity: When independent variables has
correlation among one another. Individual Variables Significance
Find: Variation Inflation Factor (VIF) – X1 Wald Statistic (Similar to t-test in MLR
and X2.
Criteria: VIF > 4 --. Presence of Multi Classification/Confusion Matrix:
collinearity. Accuracy Calculation.
Auto Correlation: Time series data
Find: Durbin Watson Test.

DADM - Cheat Sheet: Hypothesis Testing Two Way Anova

Uploaded by

Copyright:

Available Formats

DADM - Cheat Sheet: Hypothesis Testing Two Way Anova

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DADM - Cheat Sheet: Hypothesis Testing Two Way Anova

Uploaded by

Copyright:

Available Formats

DADM – Cheat Sheet

Two Way Anova

You might also like