Data Analysis: Descriptive Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Data Analysis

Data Analysis is in short a method of putting facts and figures to solve the research
problem. It is vital to finding the answers to the research question. Another
significant part of the research is the interpretation of the data, which is taken from
the analysis of the data and makes inferences and draws conclusions. Often times it
becomes difficult to deduce the raw data, in which case the data must be analysed
and deduce the result of the analysis.

The data obtained from a study may be in numerical or Quantitative form. If they are
not in numerical form a Qualitative analysis based on the experience of the
participants can be carried out. Data analysis can be performed through descriptive
statistics.

Descriptive Statistics
The brief descriptive co-efficients compiling a data set that is either a representation
of entire population or a sample is called Descriptive Statistics. The main purpose is
to provide a summary of the samples and measures done on a study.

Descriptive Statistics form a major component of all quantitative data analysis when
coupled with several graphics’ analysis. Descriptive Statistics is quite different from
Inferential Statistics, as it is more about describing what data is being shown.
However, inferential statistics deals with coming up with a conclusion drawn from
the existing data. Mainly descriptive statistics is used to describe the behavior of a
sample data. It is used to present quantitative analysis of the given set of data. As in
a study there are numerous variables that are to be measured, and hence descriptive
statistics is used to break this huge amount of data into the simplest form. For
example, it would be interesting to find out the average passes made by a footballer
in a match. Now, since there are quite several activities in one single game, hence we
can use descriptive statistics to make that simpler. Descriptive Statistics can be
measured either by measures of central tendency or variability. To help people
understand the detailed meaning of analyzed data these two measures use tables,
graphs or general discussions. Now there are different ways in which the data can be
described.
Measures of Central Tendency: Central Tendency indicates to one number that
summarizes the entire set of data/measurements which are a central to the
complete set. It describes the center position of any distribution for a given data set.
We analyze the frequency of the data point in the distribution describing it using a
mean, median and mode that measures the most common patterns of the analyzed
data set. It is the most informative description of the characteristics of any
population. There are three measures of central tendency:

1. Mean: described as the sum of the variables’ values/the total number of


values
2. Median: the middle value
3. Mode: the most often occurring value

Example: Ages of 5 randomly selected students applying for IVY league colleges are
23 years, 25 years, 27 years, 29 years and 31 years

Mean Age: (23+26+26+29+31) / 5 = 27 years

Median Age: 26 years

Modal Age: 26 years

The most commonly used measure of a tendency is the Mean value. While talking
about Medians, they are generally used when some values are extremely different
from the rest (this is called a skewed distribution.). According to National Centre for
Children in Poverty (2019), an example, if you are comparing salaries of randomly
selected people then most individuals must be earning between $0 – $200,000 while
a handful of them earn in millions.

Measures of Dispersion or Variation: This provides an information about the


range or spread of the values described for a variable. It analyzes the spread out of
distribution that is set for a data set. For example, the measures of central tendency
can give the average of a given data set, however it cannot describe how the
distribution of data set was done. The key measures of dispersion are as following:

1. Range: It is defined as the difference between smallest and the largest value
of the complete data set.
2. Variance: It measures the spread out of a set of numbers/data from their
average value. It is calculated by taking average values of the squared
differences of each value and their mean.
3. Standard Deviation: It is the measurement of average distance between each
quantity and means which is how the set of data spreads out from the mean.
A high standard deviation means that the data points are spread at wider
ranges of values, whereas a low standard deviation means that the data points
are close to the mean.
4. Skewness: It is a measure of the asymmetry in the distribution of the set of
data. For example, the income is skewed as some people might be earning
between a standard range while others will be way higher or lower to that
range. The skewness can be positive, negative or undefined.

- Positive Skew: When a distribution is skewed to the right, that is,


the tail of the curve on the right-hand side is longer than that to the
left, and the mean value > the mode, this is called a Positive Skew.

- Negative Skew: When a distribution is skewed to the left, that is,


the tail of the curve on the left-hand side is longer than that to the
right, and the mean value < the mode, this is called a Negative Skew
and for a perfect normal distribution, the tails on each side of the curve
are exactly the same.

According to National Centre for Children in Poverty (2019), an example, the incomes
of five randomly selected people in US are: $10,000, $10,000, $45,000, $60,000 and
$1,000,000

Range = 1,000,000 – 10,000 = 990,000

Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000


- 225,000)2 + (1,000,000 - 225,000)2] / 5 = 150,540,000,000

Standard Deviation = Square Root (150,540,000,000) = 387,995

Skew = Income is positively skewed


Tabulation: The orderly arrangement of data in a table or other summary format
showing the number of responses to each response category; tallying.

Frequency table : A table showing the different ways respondents answered a


question.

Cross-tabulation: Cross-tabulation is the appropriate technique for addressing


research questions involving relationships among multiple less-than interval
variables.

Cross tabulation is analysis of data in tables and is also called contingency table
analysis and for three-way tables and higher, “the elaboration model.” Cross-
tabulation deals with analysis of tabular data, which implies analysis of categorical
variables

Overall, descriptive statistics play a vital role at the time of data analysis as well as
providing the foundation for comparing variables. It is recommended as a good
research practice to report the most suitable descriptive statistics with the help of a
systematic approach to reduce the chances of presenting misleading results. The
type calculation you use depends on what you want to know.
Univariate, Bivariate and Multivariate Statistical Analysis
When it comes to the level of analysis in statistics, there are three different analysis
techniques that exist. These are –

 Univariate analysis: Measurement made on one variable per subject


 Bivariate analysis: Measurement made on two variables per subject
 Multivariate analysis: Measurement made on many variables per subject

The selection of the data analysis technique is dependent on the number of


variables, types of data and focus of the statistical inquiry. The following section
describes the three different levels of data analysis –

Univariate analysis

Univariate analysis is the most basic form of statistical data analysis technique. When
the data contains only one variable and doesn’t deal with a causes or effect
relationships then a Univariate analysis technique is used. For instance, in a survey of
a class room, the researcher may be looking to count the number of boys and girls. In
this instance, the data would simply reflect the number, i.e. a single variable and its
quantity as per the below table. The key objective of Univariate analysis is to simply
describe the data to find patterns within the data. This is be done by looking into the
mean, median, mode, dispersion, variance, range, standard deviation etc.
How Univariate analysis is conducted

Univariate analysis is conducted through several ways which are mostly descriptive
in nature

 Frequency Distribution Tables


 Histograms
 Frequency Polygons
 Pie Charts
 Bar Charts

Bivariate analysis

Bivariate analysis is slightly more analytical than Univariate analysis. When the data
set contains two variables and researchers aim to undertake comparisons between
the two data set then Bivariate analysis is the right type of analysis technique. For
example – in a survey of a classroom, the researcher may be looking to analysis the
ratio of students who scored above 85% corresponding to their genders. In this case,
there are two variables – gender = X (independent variable) and result = Y
(dependent variable). A Bivariate analysis is will measure the correlations between
the two variables as shown the table below.

How Bivariate analysis is conducted


1. Correlation coefficients

Correlations is a statistical association technique where strength of relationship


between two variables are observed. It shows the strength as strong or weak
correlations and are rated on a scale of –1 to 1, where 1 is a perfect direct
correlation, –1 is a perfect inverse correlation, and 0 is no correlation.

2. Regression analysis

Regression analysis is used for estimating the relationships between two different
variables. It includes techniques for modelling and analysing several variables,
when the focus is on the relationship between a dependent variable and one or
more independent variables. It helps to understand how the value of the
dependent variable changes when any one of the independent variables is
changed. Regression analysis is used for advanced data modelling purposes like
prediction and forecasting. There can be a range of different regression
techniques used depending on the nature of variable and the type of analysis
sought by the research. For example –

 Linear regression
 Simple regression
 Polynomial regression
 General linear model
 Discrete choice
 Binomial regression
 Binary regression
 Logistic regression

Multivariate analysis

Multivariate analysis is a more complex form of statistical analysis technique and


used when there are more than two variables in the data set. Here is an example –

A doctor has collected data on cholesterol, blood pressure, and weight.  She also
collected data on the eating habits of the subjects (e.g., how many ounces of red
meat, fish, dairy products, and chocolate consumed per week).  She wants to
investigate the relationship between the three measures of health and eating habits?
In this instance, a multivariate analysis would be required to understand the
relationship of each variable with each other.

How Multivariate analysis is conducted

Commonly used multivariate analysis technique include –

 Factor Analysis
 Cluster Analysis
 Variance Analysis
 Discriminant Analysis
 Multidimensional Scaling
 Principal Component Analysis
 Redundancy Analysis

You might also like