
I have 8k samples with pairs of scores each:

sample 1:

Score A : [0.419, 0.348,0.271,0.12,0.25,0.145,0.375,0.172,0.082]
Score B : [0.997, 0.802, 0.62, 0.67, 0.72,0.64, 0.91, 0.65, 0.55]

each sample can have a different length (up to 12 elements ). I would like to check how correlated is score A with score B in means of the order/ rank (i.e the indexes with high score A have a High score B as well and vice versa ). I transformed the scores into argument sort (min to max, using numpy.argsort(scoreA) )

Score A order : [9 4 6 8 5 3 2 7 1]
Score B order: [9 3 6 8 4 5 2 7 1]

And calculated the Spearman rank coeff :

SpearmanrResult(correlation=0.9500000000000001, pvalue=8.762523965086177e-05)

Now , as mentioned, I have 8k sample, Should I just calculate 8k tests like that and average them to see what is the overall correlation?

Also, is there any other method that will be more sensitive to the number themself? i.e for a case like :

sample X:

Score A : [0.419, 0.420, 0.271, 0.1]
Score B : [0.997, 0.996, 0.62, 0.37]

Where the difference between the first two elements is very small but the order of each is not the same. In that case, I would like to get a high correlation.

  • 2
    $\begingroup$ A sample size $\le 12$ is very small: I wouldn't trust correlations from such samples much, even to average.. But what do you know about the samples, other than that they are different samples? If nothing, then moving directly to looking at the correlation between all your data would seem indicated. That should mean at a minimum looking at a graph and seeing what kind of relationship you have. $\endgroup$
    – Nick Cox
    Commented Feb 28, 2021 at 9:17
  • 1
    $\begingroup$ (ctd) The average correlation is a thing you can calculate but it doesn't have a strong statistical meaning. It is even quite common to find situations where an overall correlation is like none of the group correlations: this is an example of what is now usually known as Simpson's paradox, although it was familiar long before SImpson wrote. $\endgroup$
    – Nick Cox
    Commented Feb 28, 2021 at 9:17
  • 1
    $\begingroup$ Perhaps you should read up more on different kinds of correlation: Pearson correlation, Spearman correlation, Kendall correlation. Getting a high number isn't a goal here: perhaps the overall relationship is just not strong and a high correlation a side-effect of outliers. $\endgroup$
    – Nick Cox
    Commented Feb 28, 2021 at 9:21


Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Browse other questions tagged or ask your own question.