Correlation analysis for binary variables in R

Question

dat <- as.data.frame(replicate(100,sample(c(0,1),100,replace=TRUE)))

I want to create a 100 by 100 matrix with the correlation coefficients between these binary variables as entries.

If the variables were continuous, then I would have used cor() to create the matrix. I am not sure if cor() with Pearson as the method is reasonable. If not, say I could find a function fn() to calculate the correlation between a pair of binary vectors. What is an efficient way to construct the 100 by 100 matrix?

What are the binary variables? ie could they represent some underlying normally distributed latent variable? — user20650, Commented Jul 22, 2016 at 13:10

shayaa · Accepted Answer · 2016-07-22 06:21:39Z

7

Not sure this is a stack overflow answer. What you are asking is for the correlation between binary vectors. This is called the Phi coefficient which was discovered by Pearson.

It approximates the Pearson correlation for small values. You might try

sqrt(chisq.test(table(dat[,1],dat[,2]), correct=FALSE)$statistic/length(dat[,1]))

and notice that it gives the same value 0.08006408 as

cor(dat[1], dat[2])

This is because the approximation is quite good for reasonably large values, say greater than 40.

So, I would advocate saving yourself some time and just using cor(dat) as the solution.

answered Jul 22, 2016 at 6:21

shayaa

2,79714 silver badges19 bronze badges

is phi just a case of pearson?
– Maths12
Commented Feb 4, 2021 at 18:40
Yes, just in the case of a 2x2 contingency table. Otherwise, they are not in the same range.
– shayaa
Commented Feb 4, 2021 at 19:31

Add a comment |

Collectives™ on Stack Overflow

Correlation analysis for binary variables in R

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
r
matrix
statistics
correlation
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged rmatrixstatisticscorrelation or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
matrix
statistics
correlation
or ask your own question.