Academia.eduAcademia.edu

Correlation Computations

This presentation slides describe the different correlation computations such as pearson product moment correlation, rank correlation, biserial correlation, point biserial, tetrachoric correlation and phi coefficent of correlation.

Correlation K.THIYAGU, Assistant Professor, Department of Education, Central University of Kerala, Kasaragod Pearson Product Moment Correlation • PPMCC or PCC or Pearson’s r • It is a measure of the strength of a linear association between two variables and is denoted by r. • It is a measure of the linear correlation between two variables X and Y It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. Early work on the distribution of the sample correlation coefficient was carried out by Anil Kumar Gain and R. A. Fisher from the University of Cambridge. Karl Pearson Pearson with Sir Francis Galton Born Died 27 March 1857, Islington, London, England 27 April 1936 (aged 79), Surrey, England Residence England Nationality British Known for Principal Component Analysis Pearson distribution Pearson's r Pearson's chi-squared test Phi coefficient Francis Galton Academic advisors Statistics is the grammer of science. (Karl Pearson) Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. r= NSXY - (SX )(SY ) N SX 2 - ( SX ) 2 N SY 2 - ( SY ) 2 Interpretation Table Pearson Product Moment Correlation Interpretation Correlation Perfect Positive +1.0 Very high positive +0.90 to +0.99 High positive +0.70 to +0.90 Moderate positive +0.50 to +0.70 Low positive +0.30 to +0.50 Very low positive +0.10 to +0.30 Negligible positive +0.01 to +0.10 No correlation 0.0 Negligible negative - 0.01 to -0.10 Very low negative - 0.10 to -0.30 Low negative - 0.30 to -0.50 Moderate negative - 0.50 to -0.70 High negative - 0.70 to -0.90 Very high negative - 0.90 to -0.99 Perfect negative -1.0 • uses of PPMCC • • • • • • • • Correlation is used to describe the degree of relationship between two variables. The reliability of test is calculated in terms of Pearson (r) The validity is estimated by the co-efficient of correlation (r) Item discrimination power is calculated by using Pearson’s ( r) Multiple correlation based on Pearson’s r Partial correlation employs the co-efficient of correlation ( r) Factor-analysis technique is the extension of Pearson’s r It predicts the depended variables on the basis of independent variable Most of the personality theories are also developed by using this correlation. Disadvantages of PPMCC • It is a linear correlation. When the two variables have the linear distribution would yield accurate co-efficient of correlation, but the two variables are curve linearly distributed, then the correlation of co-efficient of two variables is not dependable. This assumption is taken into consideration while using this technique. • The distribution of scores of the two variables should be normal. It the distributions are skewed, it would not yield dependable correlation. The assumption is not usually observed. Spearman’s Rank-Difference Correlation Spearman’s Rank Correlation The Spearman’s Rank Correlation Coefficient is the non-parametric statistical measure used to study the strength of association between the two ranked variables. This method is applied to the ordinal set of numbers, which can be arranged in order, i.e. one after the other so that ranks can be given to each. It was developed by Charles Spearman; this it is called the Spearman rank correlation. Spearman's rank correlation coefficient denoted by the Greek letter r (rho) It assesses how well the relationship between two variables can be described using a monotonic function. Born 10 September 1863, London, United Kingdom Died 17 September 1945 (aged 82), London, United Kingdom Known for g factor, Spearman's rank correlation coefficient, factor analysis Notable students Raymond Cattell, John C. Raven, David Wechsler Influences Francis Galton, Wilhelm Wundt Charles Edward Spearman Monotonic Relationships While Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). A monotonic relationship is a relationship that does one of the following: (1) as the value of one variable increases, so does the value of the other variable; or (2) as the value of one variable increases, the other variable value decreases Monotonic relationships are where: • One variable increases and the other increases. Or, • One variable decreases and the other decreases. Monotonic variables increase (or decrease) in the same direction, but not always at the same rate. Linear variables increase (or decrease) in the same direction at the same rate. If an increase in the independent variable causes a decrease in the dependent variable, this is called a monotonic inverse relationship. An inverse relationship is the same thing as a negative correlation. A monotonic direct relationship is where an increase in the independent variable causes an increase in the dependent variable. In other words, there’s a positive correlation between the data. Formula for the Spearman’s Rank Correlation Co-efficient D = Difference of ranks N = Number of Observations When the ranks are repeated the formula is where m1, m2, ..., are the number of repetitions of ranks Example English Maths Rank Rank d d2 (mark) (mark) (English) (maths) 56 66 9 4 5 25 75 70 3 2 1 1 45 40 10 10 0 0 71 60 4 7 3 9 62 65 6 5 1 1 64 56 5 9 4 16 58 59 8 8 0 0 80 77 1 1 0 0 76 67 2 3 1 1 61 63 7 6 1 1 rbis Biserial Correlation Estimate of the relationship between a continuous variable and a dichotomous variable. The Term ‘Dichotomous’ means cut into two parts or divided into two categories Continuous Artificial Dichotomy One Variable Other Variable Artificial Dichotomy Socially adjusted Socially maladjusted Athletic non-athletic Radical Conservative Poor Not poor Social minded Mechanical minded Drop outs Stay-ins Successful Unsuccessful Moral immoral Natural Dichotomy Right Wrong Male Female Living Dead Owning a home Not owing a home Being a farmer Not being a farmer Being a Ph.D Not being a Ph.D Living in Delhi Not living in Delhi Formula for biserial correlation is rbis biserial r Mp & Mq Mean test scores respectively for those who pass and fail the item p&q Proportions who pass and fail the item y height of the ordinate of the normal curve at the point of division between p and q proportions of cases s SD of the entire group rp, bis Point Biserial Correlation Continuous Genuine or Natural Dichotomy One Variable Other Variable Estimating the relationship between two variables when one variables is in a continuous state and other is in the state of a natural or genuine dichotomy. The Term ‘Dichotomous’ means cut into two parts or divided into two categories Formula for point biserial correlation is Where r pbis = Point biserial correlation Mp = Mean of the 1st group Mq = Mean of the 2nd group P = Proportion of 1st group Q = Proportion of 2nd group s = Standard Deviation of the total group rt. Tetrachoric Correlation Estimating the relationship between both variables variables when dichotomous. two are Tetrachoric correlation is suitable for situations in which neither of the two variables can be measured in terms of scores but both the variables can be separated in terms of two categories. Eg: To study the relationship between intelligence and emotional maturity, the first variable, ‘Intelligence’ may be dichotomised as above average and below average and the other variable ‘emotional maturity’, as emotionally mature and emotionally immature. Dichotomous Dichotomous Eg: if we want to study the relationship between ‘adjustment’ and ‘success’ in a job, we can dichotomize the variables as adjustedmaladjusted and success-failure. Tetrachoric Correlation • Estimates what the correlation between two binary variables would be if the ‘ratings’ were made on a continuous scale. • If two variables artificial nominal dichotomous - Formula for Tetrachoric Correlation is Pass Fail Trained (A) (B) Untrained (C) (D) If AD is greater than BC, then the correlation is Positive If BC is greater than AD, then the correlation is negative. f Phi Coefficient Same Genuine Dichotomous Genuine Dichotomous Compute correlation / relationship between two variables which are genuinely dichotomous attributes Phi coefficient correlation is suitable for situations in which neither of the two variables can be measured in terms of scores but both the variables can be separated in terms of two categories. When both the variables are dichotomous in the same attributes, we can use the phi coefficient correlation rt. Formula for Phi Coefficient (F) correlation is Fail Pass Pass (B) (A) Fail (D) (C) Pass Fail Pass (A) (B) Fail (C) (D) Favourable unfavourable Favourable (A) (B) unfavourable (C) (D) If AD is greater than BC, then the correlation is Positive If BC is greater than AD, then the correlation is negative. Types of Correlation Coefficients Correlation Coefficient Types of Scales Pearson product-moment Both Scales - Interval (or) Ratio Spearman rank-order Both Scales - Ordinal Phi Both scales are naturally dichotomous (nominal) Tetrachoric Both scales are artificially dichotomous (nominal) Point-biserial One scale naturally dichotomous (nominal), one scale interval (or ratio) Biserial One scale artificially dichotomous (nominal), one scale interval (or ratio) Gamma One scale nominal, one scale ordinal