Hypothesis Testing Two way ANOVA. Variable: One Hypothesis: Assumption about the [population 2 Factor: Categorical Variable (treatments) Parameter (Mean, Median, Mode, Variance, SD). Eg: - Variable: Marks. Ho & Ha – Null and Alternate Ho: No Effect (Status Quo) 1 Factor (Sections): A,B,C,D,E Ha: Effect (Prove/Disprove) 2 Factors (States) : MH, TN, TG, Three Approaches: Critical Value, p-Value and Conf. B/w Column Variance and Within Column Variance. Interval. Sum of squares, DF, Mean Sum of Squares. Hypothesis Testing Process F- Distribution = MSTR/MSE Interaction Effect. DCD approach – Define, Calculate, Decide. PLAN ®. Chi-Square Test – Non parametric Type 1 & Type 2 errors Data Type: Nominal/Count/Frequency. Type 1: Reject Ho when it is TRUE. Most widely test. HT Type 1 Error Significance Test. Goodness of Fit. Type 2 error Wording - Accept Ho Fail to Ho: The data follows a particular distribution. Reject. Observed Data (Sample) Vs Expected data. ONE SAMPLE TEST Data Type: Nominal/Count/Frequency. Population Parameter : Mean No. of Variables: TWO categorical VARIABLE: ONE Test of Independence. Data: Numeric (Interval/Ratio) Ho: Two variables are Independent. Z- TEST Popln SD is known Observed Data (Sample) Vs Expected data. T-test - Popln SD is Unknown. Right Tail, Left Tail Test and Two Tail Test. Spearman Rank Correlation – Ordinal Two Sample Test Pearson Correlation – Numeric Value: [-1 to + 1]. Population Parameter: Means Positive: 0.5 and below Low VARIABLE: ONE Positive: 0.5 and 0.7 Moderate Data: Numeric (Interval/Ratio) Positive: above 0.7 /0.9 High Population SD. ZERO NO Correlation Z- TEST Popln SD is known Negative: 0.5 and below Low T-test - Popln SD is Unknown. Negative: 0.5 and 0.7 Moderate T-test: Independent sample and Paired t-test. Negative: above 0.7 /0.9 High Independent Sample. (Sample size can be different. Two different populations. Simple Liner regression Paired/Dependent/Matched Sample is equal. Dependent and Independent Variable (DV & IV). (Before and After) Simple – 1 DV and 1 IV Linear relationship DV and IV More than 2 samples – 1 way ANOVA Regression Past data Variable: One Objective: To find the best fit line. (Linear). y-hat = 1 Factor: Categorical Variable (treatments) bo+b1.x1 Eg: - Variable: Marks. Sections: A,B,C,D,E Best fit Line? estimates(b0, b1) BLUE. Hypothesis – Ho & Ha Criteria: Minimizing the error sum of Squares Ha: Not all means are equal. OLS method. B/w Column Variance and Within Column Variance. Sum of squares, DF, Mean Sum of Squares. F- Distribution = MSTR/MSE Model Diagnostics – SLR Logistic Regression: Variance Explained R-Square. Dependent Variable: Categorical Overall Model Fit F-Test (ANOVA) Categorical 2 levels/Class Binary LR Individual variable Significance: T-test. Categorical > 2 levels/Class Multinomial LR. Error Term: Residual Analysis – Assumption about The Error Term We cannot use linear line more Error terms are independent, normally misclassification - Categorical nature of DV. distributed, mean zero and constant CRITERIA: MLE (Maximum Likelihood Estimation). variance. Visual Representation. Sigmoid Function Histogram, P-p plot – Normal Distribution • [0,1] categorical variable (o,1) Homoscedasticity – Horizontal Band. • Value of Z – [ -inf to + inf ] Multiple Liner Regression Output of LR Probability of Success Class. Dependent and Independent Variable (DV & IV). Cut –off (researcher) Group Membership. Mulitple – 1 DV and More than One IV LOGIT Function: LINK Function Linear relationship DV and IV • Interpreting the relationship between Regression Past data IV and DV. Objective: To find the best fit line. (Linear). Y-hat = • Sigmoid Function: LHS = Ln(Odds) bo+b1.x1+b2.x2 • RHS = Linear Function. Best fit Line? estimates (b0, b1,b2) BLUE. • Ln (odds) = logit(p)= b0+b1.x1+b2.x2. Criteria: Minimizing the error sum of Squares • Odds = Exp (x1) OLS method. • Odds > 1 probability increases for that case towards success class. Variance Explained R-Square. • Odds < 1 probability decreases for Adjusted R-Square: penalizing for independent that case towards success class. variables (p) Model Parsimony Model with • Odds = 1 probability remains optimum no. of variables Maximum Variance. unchanged. Overall Model Fit F-Test (ANOVA) Individual variable Significance: T-test. Logistic Regression - Model Diagnostics Error Term: Residual Analysis – Assumption about the error Term. Variance Explained: Pseudo R-Square Error terms are independent, normally Cox-Snell (Theoretical Value of 1 – Cannot Reach) distributed, mean zero and constant Nagelkarke --- Theoretically it can reach 1. variance. Visual Representation. Overall Model Fit (Similar to F-Test in MLR). Histogram, P-p plot – Normal Distribution Omnibus Test Homoscedasticity – Horizontal Band. Hosmer and Lemeshow Multi Collinearity: When independent variables has correlation among one another. Individual Variables Significance Find: Variation Inflation Factor (VIF) – X1 Wald Statistic (Similar to t-test in MLR and X2. Criteria: VIF > 4 --. Presence of Multi Classification/Confusion Matrix: collinearity. Accuracy Calculation. Auto Correlation: Time series data Find: Durbin Watson Test.