Week 9-Stat Data Processing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

CD 314: Research Methods

and Skills

STATISTICAL DATA
PROCESSING

Mr. John Lanata


Scope

 Initial situation  Processes


• Set of data • manual editing &
• Recorded on forms coding
 Objective • data capture
• Description • machine editing
• Sample
• estimation
• Population
• Inference • summarization
• From sample to • inference
population
• presentation
Methodological analysis

 Sample size
 Response rate
 Non-response bias
 Missing data
Manual editing

 Less done now than in days


before personal computers
became widespread
 Only those tasks that are not
feasible to be done by computer
need be done, eg.
• checking that all forms are present
• assessing legibility
Manual coding

 Needs to be done where data are


in textual form
 Process consists of allocating a
suitable code to each such data
item
 It may be necessary to develop
suitable code lists based on data
if not already done
Data entry/capture

 Process consists of entering data into


computer records, usually through a
keyboard
 Storage options include:
• Spreadsheet (eg. Excel)
• Database (eg. Access)
• Other programs
 Choice depends on volume &
complexity of data
Data entry into Excel

 Raw data should be kept in one


or more sheets
 Analysis etc. should be done in
other sheets or in other
workbooks
Data entry into Access

 May be done directly into one or


more tables
 Usually better to design special
forms for data entry
Machine editing

 Process consists of detecting and


possibly amending data that is or
may be incorrect
 Two main types of edit check:
validity & consistency
 Simple editing may be done in Excel
 Access queries enable much more
complex editing
Estimation

 Process consists of producing estimates


relating to population from sample data
 For self-weighting sample, estimates are
calculated directly from sample data
 Otherwise, suitable weights must be
calculated and applied to sample data
 These calculations more easily done in
Access than in Excel
Summarization

 Forms include:
• Frequency distributions (one or more
variables)
• Measures of central tendency
• Measures of dispersion
• Positional measures
• Correlation
• Regression
 Extent of multivariate tabulation is
constrained by sample size
Summarizing in Excel

 Frequency distribution for one variable


may be constructed by using Histogram
option
 Cumulative distribution may be easily
constructed with COUNTIF function
 Statistical functions use raw data as input
 If data available only in form of frequency
distribution, calculation tables may be set
up
Multivariate analysis in
Excel
 Bivariate & multivariate frequency
distributions are difficult to
construct in Excel
 Data Analysis option of Tools menu
enables more advanced multivariate
analysis of raw data, eg.
• multiple regression
• moving averages
Summarizing in Access

 Only few statistical functions


available
 Frequency distribution for one
variable may be easily constructed
by using Total query
 Bivariate & multivariate
frequency distributions may be
constructed by using Crosstab query
Inference in Excel &
Access
 Excel has many suitable
functions, eg.
• CHIDIST
• FDIST
• NORMDIST
• TDIST
 Access does not facilitate
inference
Inference in other
programs
 Many specialized statistical
programs available, eg.
• SPSS
• SAS
• Minitab
 These require some knowledge of
inference theory in order to choose
suitable tests & interpret them
Tabular presentation

 Simple tabulations may be


constructed in Excel
 Complex tabulations may be
constructed in Access by using
wizards
Graphical presentation

 Excel may be used to produce


wide range of charts
 Access does not produce charts
so easily
Joint use of Excel &
Access
 Each program has its relative
advantages
 Easy to transfer data between
Excel & Access
 Best to use both & transfer data
etc. between them as needed
Strengths of Access

 Generally best to use Access


for:
• data entry & storage
• machine editing
• complex estimation
• frequency distributions
Strengths of Excel

 Generally best to use Excel for:


• summarization
• inference
• charts
Overall presentation

 Easy to transfer tables & charts


from Access & Excel into:
• Word for full report
• PowerPoint for summary
presentation
OpenOffice programs

 Similar remarks apply to equivalent


programs in the OpenOffice suite,
which is available in some ITS labs
 Equivalent programs are:
MS OFFICE OPENOFFICE
Word <> Writer
Excel <> Calc
Access <> Base
PowerPoint <> Impress

You might also like