Questions tagged [descriptive-statistics]
Descriptive statistics summarize features of a sample, such as mean and standard deviations, median and quartiles, the maximum and minimum. With multiple variables, may include correlations and crosstabs. Can include visual displays - boxplots, histograms, scatterplots and so on.
1,831 questions
1
vote
1
answer
39
views
Data categorization
I have categorized my education dataset for the analysis below. However, I have one occurrence of a respondent who attended a Missionary school that I do not know its level and am unsure where to ...
0
votes
0
answers
35
views
Employment status categories that include pensioners, learners, students and non schooling
I have collected a dataset on Employment status. I created the following categories; Pensioners, Formally employed, Informally Employed, Self-employed, and Unemployed. I also have Learners or Students ...
0
votes
0
answers
16
views
Estimate SD[Z=X*Y] which one is correct?
Given we have a sample of two variables X and Y with sample size n. I want to calculate the standard deviation of Z = X*Y. I don't know which of the two options bellow are correct?
Option 1:
Simply ...
0
votes
0
answers
30
views
How to analyze differences in growth data with non-normal distributions?
I have data on fungal growth (growth rate and latency period) from different strains. I want to examine whether these values differ based on origin (2 types), growth medium (2 types), and incubation ...
0
votes
1
answer
15
views
How can I filter outliers in data that is manually recorded?
Different people have to write down values on a certain type of parameter in order to fill out a table, and people obviously tend to write wrong. Sometimes, by a factor of 1000. This creates a lot of ...
6
votes
3
answers
536
views
Median Absolute Deviation of Zero
I am using MAD as a measure of the spread of different distributions of numeric data, and some of these distributions have MAD of 0. I am curious, how is this possible? If I understand correctly, MAD ...
2
votes
4
answers
337
views
What descriptive statistics to report for a paired sample when using a nonparametric test?
I have two variables, A and B, (paired sample, before-after) and I need to conduct a paired sample nonparametric test because the distribution of the difference between A and B is not normal (...
0
votes
0
answers
17
views
How can I devise a new metric to evaluate monthly loss prevention performance based on historical data?
I'm reporting on loss numbers for my company every month, and I’m looking to create a metric that can clearly indicate whether we’re doing better or worse in preventing losses. The goal is to reflect ...
8
votes
1
answer
413
views
The Info measure of Hmisc::describe
The documentation for Hmisc::describe (page 77 of the PDF) says:
For numeric variables, describe adds an item called Info which is a relative information measure
using the relative efficiency of a ...
1
vote
1
answer
56
views
How to assess the absolute strength of a correlation between two multifactorial variables when some of their factors are negatively correlated
I have two multifactorial constructs, one of which consists of 5 subfactors, and the other 4. I am looking to evaluate the 'absolute' strength of the association between the two constructs.
The issue ...
1
vote
1
answer
26
views
Is there away to compute Index values (base 100) from Year-over-Year % change (YoY) of the variable?
Let's assume I have a time series like this :
Time period
YoY Change (%)
Y2024 _ Q1
7.00
Y2024 _ Q2
4.85
Y2024 _ Q3
5.77
Y2024 _ Q4
5.66
Y2025 _ Q1
6.54
Y2025 _ Q2
6.48
Y2025 _ Q3
6.36
Y2025 ...
2
votes
1
answer
66
views
Is anomaly forecasting in time series analysis possible?
I am currently working on a univariate time series data and I wanted to know if anomaly forecasting is possible in time series.
I previously worked on anomaly detection which detects the anomaly when ...
0
votes
0
answers
8
views
Comparing retrospective and prospective cohorts
I am a bit confused on comparing retrospective and prospective cohorts …
Say I have two cohorts, one retrospective (used as the control) and one prospective cohort (with a program/intervention applied)...
1
vote
0
answers
14
views
How to characterize a dataset
I know we can compute a correlation matrix for continuous variables in a dataset. However, to summarize the degree of correlations between variables, is it possible to do a mean or something else ...
3
votes
1
answer
273
views
When trying to find the quartiles for discrete data, do we round to the nearest whole number?
I have 2 cases:
First, I want to find the first quartile to the set data: 1,2,3,4,5.
Normally we calculate the qaurtile as:
Now $Q_1 = \frac{2+1}{2} = 1.5$.
But they never state if its continuous (i.e....
5
votes
3
answers
218
views
Triangular correlations?
As used in my answer at https://stats.stackexchange.com/a/652022/11887, triangular correlation seems to be a useful concept/terminology that could see more use. But searching I cannot see much use, ...
0
votes
1
answer
37
views
How to adjust a variable by age, sex and BMI
I'm working on a database with almost 300 patients, of which I have their age, sex, BMI and their level of diabetes (which, for the purpose of the study, is stratified into mild, intermediate or ...
2
votes
1
answer
59
views
Looking for statistical function to represent data
Say I have a molecule that contains 9 atoms. For each atom, I have calculated the properties as prop_1, prop_2... prop_5 using a certain methodology. To check if the methodology was stable, I ran the ...
0
votes
1
answer
34
views
Can I do weighting on continuous variables?
I have two datasets collected in 2018 and 2023. I was going to check if there's any difference between the two datasets, but the ratio of sex and family size was different from each other (...
3
votes
0
answers
20
views
How can you calculate Q1 and Q3 for even numbers? [duplicate]
I've searched this question everywhere, but I've found different answers and none of them result in the same answer Numpy gave me.
I have the following data:[0, 1, 2, 3, 4, 4, 5, 5, 6, 8]
When using ...
1
vote
1
answer
54
views
Appropriate significance test for small sample size, unclear (non-normal) distribution
Given data:
30 datapoints in the form of usability scores (System Usability Scale)
2 groups within data (Group 1: 17, group 2: 13, independent, unequal sizes)
Objective:
See if there is any ...
1
vote
1
answer
67
views
What analysis do I need for 2 Ivs and 2 DVs?
I think I need a MANOVA, but I may need an ANCOVA and then also do an ANOVA within just the experimental group.
I think it's a 2x2 factorial independent measures ANOVA within the intervention group. I ...
1
vote
1
answer
21
views
Hausman Test Reults Report
I've been working on a research project using a multi-level regression model, and I'm currently figuring out how to present the Hausman test results. I've seen some papers where authors mention doing ...
3
votes
3
answers
79
views
Is considering a specific distribution necessary before computing an average?
Many times, I compute averages of variables without considering distributions at all, and I use those computed averages to represent measures of variables in my data without mentioning specific ...
0
votes
0
answers
45
views
calculating percentage normalisation
Please don't laugh or close this post I am confused as to how I can calculate the percentage normalization occurring post treatment.
So, here is a background of the problem, I have reading from ...
3
votes
1
answer
76
views
Reference for Directional Statistics of Plane Orientation
I've got a project I'm working on where I've got the orientation (normal) vectors of planes. These vectors are all within a unit hemisphere where the $z$-coordinate is strictly positive. The ...
2
votes
1
answer
30
views
Analyzing lists and variables of multiple answers
My current issue lies within EMR extracted data for medications. There are multiple variables named: Medication_1, Medication_2, Medication_3, etc...
This data may overlap and analyzing each column ...
4
votes
1
answer
68
views
How is this formula derived
I have carried out an experiment with several repeats and my program has returned the values of EC50 and its CI(95%) confidence intervals, LogEC50 and the standard error of (LogEC50). The logs are ...
0
votes
0
answers
23
views
Normalization of absorbance data by fresh weight of samples
A [protocol] says
For each sample the absorbance (Abs) reading was divided by the fresh weight of the sample in grams. The results were normalized using an arbitrary value of 1 for the control ...
0
votes
1
answer
27
views
Survival function for a certain population
If the survival function for a certain population is given by $$ s(x) = \left( \frac{1}{1+x} \right)^4 $$ for $x \ge 0$, how long would you expect a person who is $41$ years old to live?
(a) $14$ ...
2
votes
0
answers
45
views
Standardization of summary statistic of group-linked values
Assume that you measure a summary statistic (e.g., arithmetic mean) in measurement windows of a fixed size along a long sequence of values and that these values are grouped into regions belonging to ...
0
votes
0
answers
23
views
OLS Model with Lags - logged coeff
i am building a OLS model using python, where the dependant and independent variables are lagged. This is a form of econometrics model where i want to figure out how much each independent variable ...
0
votes
0
answers
56
views
Comparing one value to mean or median of a group
I have very limited statistical knowledge or experience, so I am looking for some guidance on my approach for statistical analysis.
I have a group of 60 values. I am looking to compare the last value ...
3
votes
1
answer
61
views
Should we routinely conduct unsupervised learning when reporting descriptive statistics on data?
A standard approach prior to conducting a predictive or inferential analysis is to report some basic univariate descriptive statistics on the study variables: mean, median, minimum, maximum, variance, ...
0
votes
1
answer
20
views
how to summarize moving average
This question is about how to summarize moving averages.
Please assume the values in
column Pct are % of people how have negative opinion about vaccine.
column MovgAvg is the two year moving average
...
1
vote
1
answer
57
views
Which statistical model is suitable?
I have the results of a survey of $n=132$ patients with their socio-economic profile and their spending behavior on mobility-coins (my thesis topic).
In the survey, we asked people how they would ...
2
votes
2
answers
75
views
type I error in multiple comparisons summary statistics
I understand type I error can increase if we run 'multiple comparisons'. But does that refer to comparisons within a multi-category variable (or a group variable)? or multiple analyses using the same ...
1
vote
0
answers
99
views
Maximum Likelihood in High Dimensions [closed]
What are some examples of high-dimensional random variables for which MLE are solved using numerical methods because we are unable to explicitly solve the equations nicely? The only example to comes ...
0
votes
0
answers
11
views
Variability of patients in different Hospitals
just to let you know that my statistical background is not very good. I made funnel graphs which shows the effect of polypharmacy against practice size.
In the funnel graphs, a dot presents a single ...
2
votes
1
answer
28
views
Should a Better User Engagement Model Keep Outperforming Old Models Across Time?
Context
Let's say we are talking about a machine learning model that governs some user interaction (e.g. pricing model, recommendation model etc.) on an app. Let's say model v1 is champion (in ...
3
votes
2
answers
355
views
Is the exponent of the MAD (Median Absolute Deviation) of log transformed Data measuring the relative distance from median in the untransformed data?
I want to confirm whether taking the Exponent of the MAD of Log Transformed Data gives me a measure of relative distance from median of the original untransformed data.
So say I have a MAD of 0.2 for ...
1
vote
0
answers
28
views
Degrees of freedom for estimation
In the context of estimators, why is it that in general dividing by the degrees of freedom(instead of the sample size) leads to unbiasedness? I see the value in substituting degrees of freedom for ...
1
vote
1
answer
61
views
Percentiles of a distribution of weighted summary statistics
Suppose I have a collection of different independent probability distributions, $\{ P_i(X)\}_{i=1}^N$, each with their own support $I_i$. I know that the $10^{th}$ percentile of a given distribution ...
0
votes
0
answers
8
views
Looking for ideas to improve Weighted Moving Average result
I have two data sets:
...
0
votes
0
answers
44
views
Comparing averages of two groups
Say I want to check if the averages of 2 groups (A and B) are the same.
For each member in group A, there are some observations; same applies for group B.
If I want to take the average of A and B, I ...
3
votes
2
answers
75
views
Is it possible to calculate a standard deviation from the gini coefficient and mean?
I am looking to create an analysis showing how many people in a given country have more than than X dollars in income.
I know the average income, population count, and Gini Coefficient of income ...
13
votes
8
answers
2k
views
Is descriptive statistics enough to compare test scores of students in a class?
I am reviewing the theory on hypothesis testing and the book I am reading ("Hypothesis Testing" by Jim Frost) stresses the fact that we do hypothesis testing and inferential statistic when ...
1
vote
0
answers
21
views
How to Evaluate Interaction Effects in Propensity Score-Matched Samples:
Suppose I want to study the association between an exposure X and outcome Y, and I have used the propensity score to match each exposed subject with those unexposed but with similar characteristics ...
1
vote
1
answer
47
views
How many samples should I test to be 95% sure that no error exists?
If I have a million population of products and I will tolerate no error in them. How many samples should I test to be 95% sure that no error exists?
I am new to statistics. I know you might need some ...
0
votes
0
answers
89
views
Difference between Cross-Sectional and Nested Case-Control Samples in Cohort Studies
I have a question regarding the cross-sectional sample from the most recent visit and the nested case-control sample extracted from a cohort study, especially when the exposure of interest was ...