Data Presentation & Data Analysis
Data Presentation & Data Analysis
Data Presentation & Data Analysis
INTRODUCTION
Data analysis is the process of developing answers to questions through the examination and
interpretation of data. The basic steps in the analytic process consist of identifying issues,
determining the availability of suitable data, deciding on which methods are appropriate for
answering the questions of interest, applying the methods and evaluating, summarizing and
communicating the results.
Analytical results underscore the usefulness of data sources by shedding light on relevant issues.
Some Statistics Canada programs depend on analytical output as a major data product because,
for confidentiality reasons, it is not possible to release the microdata to the public. Data analysis
also plays a key role in data quality assessment by pointing to data quality problems in a given
survey. Analysis can thus influence future improvements to the survey process.
Data analysis is essential for understanding results from surveys, administrative sources and pilot
studies; for providing information on data gaps; for designing and redesigning surveys; for
planning new statistical activities; and for formulating quality objectives.
Results of data analysis are often published or summarized in official Statistics releases.
A statistical agency is concerned with the relevance and usefulness to users of the information
contained in its data. Analysis is the principal tool for obtaining information from the data.
Data from a survey can be used for descriptive or analytic studies. Descriptive studies are
directed at the estimation of summary measures of a target population, for example, the average
profits of owner-operated businesses in 2005 or the proportion of 2007 high school graduates
who went on to higher education in the next twelve months. Analytical studies may be used to
explain the behaviour of and relationships among characteristics; for example, a study of risk
factors for obesity in children would be analytic.
To be effective, the analyst needs to understand the relevant issues both current and those likely
to emerge in the future and how to present the results to the audience. The study of background
information allows the analyst to choose suitable data sources and appropriate statistical
methods. Any conclusions presented in an analysis, including those that can impact public
policy, must be supported by the data being analyzed.
o Objectives. What are the objectives of this analysis? What issue am I addressing?
What question(s) will I answer?
o Justification. Why is this issue interesting? How will these answers contribute to
existing knowledge? How is this study relevant?
o Data. What data am I using? Why it is the best source for this analysis? Are there
any limitations?
o Analytical methods. What statistical techniques are appropriate? Will they satisfy
the objectives?
Ensure that the data are appropriate for the analysis to be carried out. This requires
investigation of a wide range of details such as whether the target population of the data
source is sufficiently related to the target population of the analysis, whether the source
variables and their concepts and definitions are relevant to the study, whether the
longitudinal or cross-sectional nature of the data source is appropriate for the analysis,
whether the sample size in the study domain is sufficient to obtain meaningful results and
whether the quality of the data, as outlined in the survey documentation or assessed
through analysis is sufficient.
If more than one data source is being used for the analysis, investigate whether the
sources are consistent and how they may be appropriately integrated into the analysis.
Having determined the appropriate analytical method for the data, investigate the
software choices that are available to apply the method. If analyzing data from a
probability sample by design-based methods, use software specifically for survey data
since standard analytical software packages that can produce weighted point estimates do
not correctly calculate variances for survey-weighted estimates.
Determine whether it is necessary to reformat your data in order to use the selected
software.
Presentation of Results
Focus the article on the important variables and topics. Trying to be too comprehensive
will often interfere with a strong story line.
Arrange ideas in a logical order and in order of relevance or importance. Use headings,
subheadings and sidebars to strengthen the organization of the article.
Keep the language as simple as the subject permits. Depending on the targeted audience
for the article, some loss of precision may sometimes be an acceptable trade-off for more
readable text.
Use graphs in addition to text and tables to communicate the message. Use headings that
capture the meaning (e.g. "Women's earnings still trail men's") in preference to traditional
chart titles (e.g."Income by age and gender"). Always help readers understand the
information in the tables and charts by discussing it in the text.
When tables are used, take care that the overall format contributes to the clarity of the
data in the tables and prevents misinterpretation. This includes spacing; the wording,
placement and appearance of titles; row and column headings and other labeling.
Include information about the data sources used and any shortcomings in the data that
may have affected the analysis. Either have a section in the paper about the data or a
reference to where the reader can get the details.
Include information about the analytical methods and tools used. Either have a section
on methods or a reference to where the reader can get the details.
Ensure that all references are accurate, consistent and are referenced in the text.
Check for errors in the article. Check details such as the consistency of figures used in the
text, tables and charts, the accuracy of external data, and simple arithmetic.
Ensure that the intentions stated in the introduction are fulfilled by the rest of the article.
Make sure that the conclusions are consistent with the evidence.
a) Data Presentation could be both can be a deal maker or deal breaker based on the delivery
of the content in the context of visual depiction.
b) Data Presentation tools are powerful communication tools that can simplify the data by
making it easily understandable & readable at the same time while attracting & keeping
the interest of its readers and effectively showcase large amounts of complex data in a
simplified manner.
c) If the user can create an insightful presentation of the data in hand with the same sets of
facts and figures, then the results promise to be impressive.
d) There have been situations where the user has had a great amount of data and vision for
expansion but the presentation drowned his/her vision.
e) To impress the higher management and top brass of a firm, effective presentation of data
is needed.
f) Data Presentation helps the clients or the audience to not spend time grasping the concept
and the future alternatives of the business and to convince them to invest in the
company& turn it profitable both for the investors & the company.
METHODS OF DATA PRESENTATION
1. Pictorial Presentation
It is the simplest form of data Presentation often used in schools or universities to provide a
clearer picture to students, who are better able to capture the concepts effectively through a
pictorial Presentation of simple data.
2. Column chart
It is a simplified version of the pictorial Presentation which involves the management of a larger
amount of data being shared during the presentations and providing suitable clarity to the
insights of the data.
3. Pie Charts
Pie charts provide a very descriptive & a 2D depiction of the data pertaining to comparisons or
resemblance of data in two separate fields.
4. Bar charts
A bar chart that shows the accumulation of data with cuboid bars with different dimensions &
lengths which are directly proportionate to the values they represent. The bars can be placed
either vertically or horizontally depending on the data being represented.
5. Histograms
It is a perfect Presentation of the spread of numerical data. The main differentiation that
separates data graphs and histograms are the gaps in the data graphs.
6. Box plots
Box plot or Box-plot is a way of representing groups of numerical data through quartiles. Data
Presentation is easier with this style of graph dealing with the extraction of data to the minutes of
difference.
7. Maps
Map Data graphs help you with data Presentation over an area to display the areas of concern.
Map graphs are useful to make an exact depiction of data over a vast case scenario.
All these visual presentations share a common goal of creating meaningful insights and a
platform to understand and manage the data in relation to the growth and expansion of one’s in-
depth understanding of data & details to plan or execute future decisions or actions.
Data analysis is defined as a process of cleaning, transforming, and modeling data to discover
useful information for business decision-making. The purpose of Data Analysis is to extract
useful information from data and taking the decision based upon the data analysis.
A simple example of Data analysis is whenever we take any decision in our day-to-day life is by
thinking about what happened last time or what will happen by choosing that particular decision.
This is nothing but analyzing our past or future and making decisions based on it. For that, we
gather memories of our past or dreams of our future. So that is nothing but data analysis. Now
same thing analyst does for business purposes, is called Data Analysis.
There are several types of Data Analysis techniques that exist based on business and
technology. However, the major Data Analysis methods are:
Text Analysis
Statistical Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
Text Analysis
Text Analysis is also referred to as Data Mining. It is one of the methods of data analysis to
discover a pattern in large data sets using databases or data mining tools. It used to transform raw
data into business information. Business Intelligence tools are present in the market which is
used to take strategic business decisions. Overall it offers a way to extract and examine data and
deriving patterns and finally interpretation of the data.
Statistical Analysis
Statistical Analysis shows “What happen?” by using past data in the form of dashboards.
Statistical Analysis includes collection, Analysis, interpretation, presentation, and modeling of
data. It analyses a set of data or a sample of data. There are two categories of this type of
Analysis – Descriptive Analysis and Inferential Analysis.
Descriptive Analysis
analyses complete data or a sample of summarized numerical data. It shows mean and deviation
for continuous data whereas percentage and frequency for categorical data.
Inferential Analysis
analyses sample from complete data. In this type of Analysis, you can find different conclusions
from the same data by selecting different samples.
Diagnostic Analysis
Diagnostic Analysis shows “Why did it happen?” by finding the cause from the insight found in
Statistical Analysis. This Analysis is useful to identify behavior patterns of data. If a new
problem arrives in your business process, then you can look into this Analysis to find similar
patterns of that problem. And it may have chances to use similar prescriptions for the new
problems.
Predictive Analysis
Predictive Analysis shows “what is likely to happen” by using previous data. The simplest data
analysis example is like if last year I bought two dresses based on my savings and if this year my
salary is increasing double then I can buy four dresses. But of course it’s not easy like this
because you have to think about other circumstances like chances of prices of clothes is
increased this year or maybe instead of dresses you want to buy a new bike, or you need to buy a
house! So here, this Analysis makes predictions about future outcomes based on current or past
data. Forecasting is just an estimate. Its accuracy is based on how much detailed information you
have and how much you dig in it.
Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to determine which action
to take in a current problem or decision. Most data-driven companies are utilizing Prescriptive
Analysis because predictive and descriptive Analysis are not enough to improve data
performance. Based on current situations and problems, they analyze the data and make
decisions.
The Data Analysis Process is nothing but gathering information by using a proper application or
tool which allows you to explore the data and find a pattern in it. Based on that information and
data, you can make decisions, or you can get ultimate conclusions.
Data Analysis consists of the following phases:
Correlation and regression are statistical measurements that are used to give a relationship
between two variables. For example, suppose a person is driving an expensive car then it is
assumed that she must be financially well. To numerically quantify this relationship, correlation
and regression are used.
Correlation can be defined as a measurement that is used to quantify the relationship between
variables. If an increase (or decrease) in one variable causes a corresponding increase (or
decrease) in another then the two variables are said to be directly correlated. Similarly, if an
increase in one causes a decrease in another or vice versa, then the variables are said to be
indirectly correlated. If a change in an independent variable does not cause a change in the
dependent variable then they are uncorrelated. Thus, correlation can be positive (direct
correlation), negative (indirect correlation), or zero. This relationship is given by the correlation
coefficient.
Regression can be defined as a measurement that is used to quantify how the change in one
variable will affect another variable. Regression is used to find the cause and effect between two
variables. Linear regression is the most commonly used type of regression because it is easier to
analyze as compared to the rest. Linear regression is used to find the line that is the best fit to
establish a relationship between variables.
Correlation Regression
Correlation uses a signed numerical Regression is used to show the impact of a unit
value to estimate the strength of the change in the independent variable on the
relationship between the variables. dependent variable.
The Pearson's coefficient is the best The least-squares method is the best technique to
measure of correlation. determine the regression line.
T-test
A t-test compares the average values of two data sets and determines if they came from the
same population. In the above examples, a sample of students from class A and a sample of
students from class B would not likely have the same mean and standard deviation. Similarly,
samples taken from the placebo-fed control group and those taken from the drug prescribed
group should have a slightly different mean and standard deviation.
Using a T-Test
Consider that a drug manufacturer tests a new medicine. Following standard procedure, the drug
is given to one group of patients and a placebo to another group called the control group. The
placebo is a substance with no therapeutic value and serves as a benchmark to measure how the
other group, administered the actual drug, responds.
After the drug trial, the members of the placebo-fed control group reported an increase in
average life expectancy of three years, while the members of the group who are prescribed the
new drug reported an increase in average life expectancy of four years.
Initial observation indicates that the drug is working. However, it is also possible that the
observation may be due to chance. A t-test can be used to determine if the results are correct
and applicable to the entire population.
Four assumptions are made while using a t-test. The data collected must follow a continuous or
ordinal scale, such as the scores for an IQ test, the data is collected from a randomly selected
portion of the total population, the data will result in a normal distribution of a bell-shaped
curve, and equal or homogenous variance exists when the standard variations are equal.
FACTOR ANALYSIS
Factor analysis is a way to take a mass of data and shrinking it to a smaller data set that is more
manageable and more understandable. It’s a way to find hidden patterns, show how those
patterns overlap and show what characteristics are seen in multiple patterns. It is also used to
create a set of variables for similar items in the set (these sets of variables are called dimensions).
It can be a very useful tool for complex sets of data involving psychological studies,
socioeconomic status and other involved concepts.
A “factor” is a set of observed variables that have similar response patterns; They are associated
with a hidden variable (called a confounding variable) that isn’t directly measured. Factors are
listed according to factor loadings, or how much variation in the data they can explain.
Exploratory factor analysis is if you don’t have any idea about what structure
your data is or how many dimensions are in a set of variables.
Confirmatory Factor Analysis is used for verification as long as you have a
specific idea about what structure your data is or how many dimensions are in a
set of variables.
FACTOR LOADINGS: Not all factors are created equal; some factors have more weight than
others. The factors that affect the question the most (and therefore have the highest factor
loadings) are bolded. Factor loadings are similar to correlation coefficients in that they can vary
from -1 to 1. The closer factors are to -1 or 1, the more they affect the variable. A factor loading
of zero would indicate no effect.