3
$\begingroup$

I forget where I got this data (I think from About.com College), but here are some statistics regarding University of California, Berkeley admissions: the 25th percentile SAT Reasoning Test score was 1870, and the 75th percentile was 2230. Let's say that my performance is a standard distribution with mean 2200 and standard deviation 50. Based on test performance alone, what's the probability that I'll be accepted? Note that I'm a high school student with actually no statistical background besides Wikipedia, so bear with me here.

Here's one way that I thought of. Let's assume the Berkeley admissions is a standard distribution, so the probability of my score, $x$, being near the majority is $P_b(x)=c_b + e^{-a_bx^2}$, where $c$ is the mean and $a$ ensures that $\int^{2400}_0P_b(x)\,dx = 1$. Therefore with the two quartiles and $a$ and $c$ we have two equations with two unknowns, and we will pretty much know $P_b(x)$. Then I will find $P_s(x)$, which is the probability that I will get a certain score $x$ between 0 and 2400. Afterwards, the probability of me being accepted $$\approx \int^{2400}_{0}P_b(x)P_s(x)\,dx.$$

Again, I have no real statistical knowledge (I'll have some next year), so is my reasoning sound?

$\endgroup$
9
  • $\begingroup$ What do you mean by "a standard distribution"? $\endgroup$
    – Glen_b
    Commented Aug 9, 2014 at 1:23
  • $\begingroup$ @Glen_b I honestly have no idea. basically "more likely than not close to the mean." $\endgroup$ Commented Aug 9, 2014 at 17:40
  • $\begingroup$ Hmm. Are you trying to get at something like unimodal and 'not too far from symmetric', or are you trying to express something else there? $\endgroup$
    – Glen_b
    Commented Aug 9, 2014 at 20:57
  • $\begingroup$ @Glen_b yeah ... is that correct? $\endgroup$ Commented Aug 10, 2014 at 5:09
  • 2
    $\begingroup$ A quick comment, without getting into the more subtle issues a more complete answer would: it is clear you cannot calculate the relevant probability from the information given. To have any chance of solving this problem, one needs to know/estimate the SAT scores of those who were rejected (or equivalently, know/estimate the SAT scores of the population of students who applied). $\endgroup$
    – guy
    Commented Aug 10, 2014 at 18:24

1 Answer 1

1
$\begingroup$

Not quite.

Your $P_b(x)$ in your notation (putting aside questions on function form) is $P(SAT | Admitted)$
$P(SAT | Admitted)P(MySAT)$ doesn't give you what you want.
What you are looking for is $P(Admitted | SAT)P(MySAT)$

Getting $P(Admitted | SAT)$ is an application of Bayes: $$ P(B|A) = {P(A|B)P(B) \over P(A)} = {P(A|B)P(B) \over P(A|B)P(B) + P(A|!B)P(!B)} $$ Translating: $$ P(Admitted | SAT) = {P(SAT | Admitted)P(Admitted) \over P(SAT)}$$ $$ = {P(SAT | Admitted)P(Admitted) \over P(SAT | Admitted)P(Admitted) + P(SAT | !Admitted)P(!Admitted)}$$

You have

  • $P(SAT | Admitted)$ - provided in your question as 25/75 percentiles, you need to assume a functional form for this
  • $P(Admitted)$ - google says this is 0.18

You do not have:

  • $P(SAT)$

Note that while you have assumed a $P(My SAT)$ you actually need the distribution of the population of all UCB applicants, not just yourself. Specifically you do not have $P(SAT| !Admitted)$.

If you are able to obtain that, then you can calculate $P(Admitted | SAT)$ and from there $P(Admitted | SAT)P(MySAT)$

There may be some abuse of notation above.


To answer your follow-up, yes, that is referencing the un-admitted population. You cannot assume the complement. A simplified version of the problem may help. Let's take a look at SAT $\ge$ 2230.

Since the 75 percentile of admitted students is 2230, that means 25% of admitted students have an SAT $\ge$ 2300 and thus
$P(SAT \ge 2300 | Admitted) = 0.25$
You can easily see the incongruity with taking the compliment and saying that 75% of non-admitted students have an SAT score greater than 2300
$P(SAT \ge 2300 | !Admitted) = 0.75$

And yes, by functional form, I mean your assumed normal. So to clarify (and clean up my notation from above).

  • $f_{SAT}(SAT=x | Admitted)$ is a assumed probability distribution function with 25/75 percentiles 1870/2230.
    If you assume this is a normal (ignoring that SATs are capped at 2400), you can set the cumulative distribution equal to 0.25 and 0.75 (or integrate the pdf)1 and solve for the mean and standard deviation to arrive at $\approx N(2050,267)$
  • $P(Admitted) = 0.18$
  • You will need to acquire or assume $f_{SAT}(SAT=x)$ or $f_{SAT}(SAT=x | !Admitted)$. For a start, you can consider the overall SAT distribution of all US students (though this is unlikely to be the distribution of UCB applicants)

This results in a function $$ P(Admitted | SAT=x) = {{f_{SAT}(SAT=x | Admitted)P(Admitted)} \over f_{SAT}(SAT=x) }$$

And then you can calculate $$ P(Admitted | SAT=My SAT)P(SAT=My SAT) $$

1 Solve the system of equations
$F_{SAT}(SAT = 1870 | Admitted) = \int_{-\infty}^{1870}f_{SAT}(SAT=x | Admitted)dx = 0.25$
$F_{SAT}(SAT = 2230 | Admitted) = \int_{-\infty}^{2230}f_{SAT}(SAT=x | Admitted)dx = 0.75$

$\endgroup$
4
  • $\begingroup$ What is $P(\mathrm{SAT}|!\mathrm{Admitted})$? Is it the distribution of SAT scores among non-admitted applicants? Excuse my naïveté, but is there a way we could get around not knowing this? My question assumes that admission is based on test score alone, so could we assume that $$P(\mathrm{SAT}|!\mathrm{Admitted})$$ is the complement of $$P(\mathrm{SAT}|!\mathrm{Admitted})$$? $\endgroup$ Commented Aug 10, 2014 at 20:54
  • $\begingroup$ By "assume a functional form," do you mean like my idealistic normal distribution? $\endgroup$ Commented Aug 10, 2014 at 20:54
  • $\begingroup$ Okay. I think we're almost at the bottom of this. Can you try to figure out $f_\mathrm{SAT}(\mathrm{SAT}=x)$ from this table? Otherwise, could you assume that UC Berkeley is skimming the top 18% off the national distribution (last page)? I think that would give a practically safe estimate. $\endgroup$ Commented Aug 11, 2014 at 6:45
  • $\begingroup$ Hi there! I am still kind of hungry for an answer. Could you complete the final calculation simply assuming the national distribution, just to get any sort of number (I tried it myself and wasn't sure how to compute it)? Also, thanks for teaching quite a bit of statistics. $\endgroup$ Commented Sep 6, 2014 at 18:00

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.