We are given 20 percentile values say equally spaced percentiles 0 to 100 and also the sample size. Basic things we could do is figure out if this is normal using qq plot, other than that what else can we do with this data ? can we generate random sample ? if its not normal, can we estimate distribution ? If we are approximating it as normal, how can we get mean and standard deviation ?
-
3$\begingroup$ This is a rather open-ended post for this forum. Do you have a specific question you wish to ask? $\endgroup$– whuber ♦Commented Jul 4, 2018 at 0:26
-
$\begingroup$ If you know any two percentiles of a normal population, you can deduce the population mean $\mu$ and the population standard deviation $\sigma.$ Accuracy can be a little better if the percentiles are at least moderately far apart. In your case, the lower and upper quartiles (25th and 75th percentiles) would work well. $\endgroup$– BruceETCommented Jul 4, 2018 at 2:39
1 Answer
Knowledge of some of the percentiles of the data (plus the sample size) is equivalent to partial knowledge of the empirical quantile function for the sample. From this we can estimate the true quantile function of the data, which is equivalent to estimating its true distribution. This can be done via non-parametric estimation, or it can be done via parametric estimation methods, assuming a particular distributional family. In the latter case, the parameters of the distribution can be estimated via standard techniques (e.g., maximum likelihood estimation), treating the data as censored values falling within percentile intervals, or approximated them as imputed values in the intervals.
A QQ plot is a way to graphically compare the empirical quantiles to the theoretical quantiles under an assumed distribution (e.g., the normal family), but we can also use formal estimation methods to fit the data to an assumed distributional form. Once the distribution is estimated, we can generate a random sample from this estimated distribution. If you would like more specific information on how do maximum likelihood estimation for data from a normal distribution that is censored into intervals, please pose this as a new (specific) question.
-
$\begingroup$ I want to draw the PDF of the CDF with just percentiles, I generated uniform random variable between 0 and 1, then obtained required RV by interpolating the provided CDF percentiles. Then I drew the distribution which gave me an approximate PDF which I was looking for. Could you talk elaborately or provide sources on parametric and non-parametric methods to estimate the PDF ? $\endgroup$– bicepjaiCommented Jul 5, 2018 at 17:49
-
$\begingroup$ That is too broad a topic - there are literally libraries full of books on non-parametric and parametric statistical methods. I suggest starting with some searches of the web and your local academic library. $\endgroup$– BenCommented Jun 22, 2019 at 23:10