Data Analysis: Descriptive Statistics
Data Analysis: Descriptive Statistics
Data Analysis: Descriptive Statistics
Data Analysis is in short a method of putting facts and figures to solve the research
problem. It is vital to finding the answers to the research question. Another
significant part of the research is the interpretation of the data, which is taken from
the analysis of the data and makes inferences and draws conclusions. Often times it
becomes difficult to deduce the raw data, in which case the data must be analysed
and deduce the result of the analysis.
The data obtained from a study may be in numerical or Quantitative form. If they are
not in numerical form a Qualitative analysis based on the experience of the
participants can be carried out. Data analysis can be performed through descriptive
statistics.
Descriptive Statistics
The brief descriptive co-efficients compiling a data set that is either a representation
of entire population or a sample is called Descriptive Statistics. The main purpose is
to provide a summary of the samples and measures done on a study.
Descriptive Statistics form a major component of all quantitative data analysis when
coupled with several graphics’ analysis. Descriptive Statistics is quite different from
Inferential Statistics, as it is more about describing what data is being shown.
However, inferential statistics deals with coming up with a conclusion drawn from
the existing data. Mainly descriptive statistics is used to describe the behavior of a
sample data. It is used to present quantitative analysis of the given set of data. As in
a study there are numerous variables that are to be measured, and hence descriptive
statistics is used to break this huge amount of data into the simplest form. For
example, it would be interesting to find out the average passes made by a footballer
in a match. Now, since there are quite several activities in one single game, hence we
can use descriptive statistics to make that simpler. Descriptive Statistics can be
measured either by measures of central tendency or variability. To help people
understand the detailed meaning of analyzed data these two measures use tables,
graphs or general discussions. Now there are different ways in which the data can be
described.
Measures of Central Tendency: Central Tendency indicates to one number that
summarizes the entire set of data/measurements which are a central to the
complete set. It describes the center position of any distribution for a given data set.
We analyze the frequency of the data point in the distribution describing it using a
mean, median and mode that measures the most common patterns of the analyzed
data set. It is the most informative description of the characteristics of any
population. There are three measures of central tendency:
Example: Ages of 5 randomly selected students applying for IVY league colleges are
23 years, 25 years, 27 years, 29 years and 31 years
The most commonly used measure of a tendency is the Mean value. While talking
about Medians, they are generally used when some values are extremely different
from the rest (this is called a skewed distribution.). According to National Centre for
Children in Poverty (2019), an example, if you are comparing salaries of randomly
selected people then most individuals must be earning between $0 – $200,000 while
a handful of them earn in millions.
1. Range: It is defined as the difference between smallest and the largest value
of the complete data set.
2. Variance: It measures the spread out of a set of numbers/data from their
average value. It is calculated by taking average values of the squared
differences of each value and their mean.
3. Standard Deviation: It is the measurement of average distance between each
quantity and means which is how the set of data spreads out from the mean.
A high standard deviation means that the data points are spread at wider
ranges of values, whereas a low standard deviation means that the data points
are close to the mean.
4. Skewness: It is a measure of the asymmetry in the distribution of the set of
data. For example, the income is skewed as some people might be earning
between a standard range while others will be way higher or lower to that
range. The skewness can be positive, negative or undefined.
According to National Centre for Children in Poverty (2019), an example, the incomes
of five randomly selected people in US are: $10,000, $10,000, $45,000, $60,000 and
$1,000,000
Cross tabulation is analysis of data in tables and is also called contingency table
analysis and for three-way tables and higher, “the elaboration model.” Cross-
tabulation deals with analysis of tabular data, which implies analysis of categorical
variables
Overall, descriptive statistics play a vital role at the time of data analysis as well as
providing the foundation for comparing variables. It is recommended as a good
research practice to report the most suitable descriptive statistics with the help of a
systematic approach to reduce the chances of presenting misleading results. The
type calculation you use depends on what you want to know.
Univariate, Bivariate and Multivariate Statistical Analysis
When it comes to the level of analysis in statistics, there are three different analysis
techniques that exist. These are –
Univariate analysis
Univariate analysis is the most basic form of statistical data analysis technique. When
the data contains only one variable and doesn’t deal with a causes or effect
relationships then a Univariate analysis technique is used. For instance, in a survey of
a class room, the researcher may be looking to count the number of boys and girls. In
this instance, the data would simply reflect the number, i.e. a single variable and its
quantity as per the below table. The key objective of Univariate analysis is to simply
describe the data to find patterns within the data. This is be done by looking into the
mean, median, mode, dispersion, variance, range, standard deviation etc.
How Univariate analysis is conducted
Univariate analysis is conducted through several ways which are mostly descriptive
in nature
Bivariate analysis
Bivariate analysis is slightly more analytical than Univariate analysis. When the data
set contains two variables and researchers aim to undertake comparisons between
the two data set then Bivariate analysis is the right type of analysis technique. For
example – in a survey of a classroom, the researcher may be looking to analysis the
ratio of students who scored above 85% corresponding to their genders. In this case,
there are two variables – gender = X (independent variable) and result = Y
(dependent variable). A Bivariate analysis is will measure the correlations between
the two variables as shown the table below.
2. Regression analysis
Regression analysis is used for estimating the relationships between two different
variables. It includes techniques for modelling and analysing several variables,
when the focus is on the relationship between a dependent variable and one or
more independent variables. It helps to understand how the value of the
dependent variable changes when any one of the independent variables is
changed. Regression analysis is used for advanced data modelling purposes like
prediction and forecasting. There can be a range of different regression
techniques used depending on the nature of variable and the type of analysis
sought by the research. For example –
Linear regression
Simple regression
Polynomial regression
General linear model
Discrete choice
Binomial regression
Binary regression
Logistic regression
Multivariate analysis
A doctor has collected data on cholesterol, blood pressure, and weight. She also
collected data on the eating habits of the subjects (e.g., how many ounces of red
meat, fish, dairy products, and chocolate consumed per week). She wants to
investigate the relationship between the three measures of health and eating habits?
In this instance, a multivariate analysis would be required to understand the
relationship of each variable with each other.
Factor Analysis
Cluster Analysis
Variance Analysis
Discriminant Analysis
Multidimensional Scaling
Principal Component Analysis
Redundancy Analysis