This document provides an overview of the statistical software package SPSS. It discusses how SPSS can be used to perform statistical analysis on research data involving large datasets in an efficient manner. SPSS allows researchers to easily input data, select analyses, and obtain results without having to perform manual calculations. The document provides basic instructions on starting SPSS and importing data from Excel files. It also outlines some key considerations for working with SPSS, such as properly defining variable types and keeping backup copies of data.
This document provides an overview of the statistical software package SPSS. It discusses how SPSS can be used to perform statistical analysis on research data involving large datasets in an efficient manner. SPSS allows researchers to easily input data, select analyses, and obtain results without having to perform manual calculations. The document provides basic instructions on starting SPSS and importing data from Excel files. It also outlines some key considerations for working with SPSS, such as properly defining variable types and keeping backup copies of data.
This document provides an overview of the statistical software package SPSS. It discusses how SPSS can be used to perform statistical analysis on research data involving large datasets in an efficient manner. SPSS allows researchers to easily input data, select analyses, and obtain results without having to perform manual calculations. The document provides basic instructions on starting SPSS and importing data from Excel files. It also outlines some key considerations for working with SPSS, such as properly defining variable types and keeping backup copies of data.
This document provides an overview of the statistical software package SPSS. It discusses how SPSS can be used to perform statistical analysis on research data involving large datasets in an efficient manner. SPSS allows researchers to easily input data, select analyses, and obtain results without having to perform manual calculations. The document provides basic instructions on starting SPSS and importing data from Excel files. It also outlines some key considerations for working with SPSS, such as properly defining variable types and keeping backup copies of data.
Study Material on SPSS by Ms. Sumitha Achar (sumithaachar@yahoo.
com) for One-Day National level Workshop on Applications
of SPSS in Research Data Analysis, 4 th March, 2014. Page 1
SPSS FOR
SPSS, standing for Statistical Package for the Social Sciences, is a powerful, user-friendly software package for the handling statistical analysis of data. This package is particularly useful for researchers in psychology, sociology, psychiatry, and other behavioral sciences, as it deals with an extensive range of both univariate and multivariate procedures much used in these disciplines. For practicing researchers and business managers, many a times it becomes difficult to solve the real life data/problems involving statistical methods. The books available on statistics do give a comprehensive picture of statistics as a facilitating tool for decision making but they invariably fail in helping the researcher/manager in solving and getting results for practical problems. Using simple examples, these books very successfully explain simple calculation procedures as well as the concepts behind them. However manual calculations, being cumbersome, tiresome and error-prone can be successful only to the extent of explaining the concepts and not for solving the real life research problems involving huge amount of data. For this reason, most of the practical statistical analyses are done with the help of an appropriate software package. A researcher/manager is only required to prepare the input data and should be able to get the final result easily with the help of software packages, so that focused attention can be given to various other aspects of problem solving and decision making. A wide variety of software packages such as SPSS, Minitab, SAS, STATA, S-PLUS etc. are available for statistical analyses. Microsoft Excel can also be used very successfully to solve a wide variety of problems. This study material is an effort towards facilitating a researcher in solving statistical problems using computers. The chosen Statistical software is SPSS which is a very comprehensive and widely available package for statistical analysis.
-Sumitha Achar
ADDRESS: Assistant Professor AI MI T St.Aloysius College (Autonomous) Mangalore, Beeri, 575022 : 99808 85896 Email: [email protected] [email protected]
BEGINNERS Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 2
MODULE-1 INTRODUCTION TO SPSS INTRODUCTION: SPSS (originally, Statistical Package for the Social Sciences) was released in its first version in 1968 after being developed by Norman H. Nie and C. Hadlai Hull. SPSS is among the most widely very powerful and user friendly program for statistical analysis in social science. Market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others use it. In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the data file) are features of this software. SPSS is a very powerful and user friendly program for statistical analyses. Anyone with a basic knowledge of statistics who is familiar with Microsoft Office can easily learn analyses in SPSS with a simple click of the mouse.
NOTE: Statistical analysis is like a sewer. What you get out of it largely depends on what you put into it. Over 82 percent of all statistics are made up on the spot to try to prove a point. You can conclude just about anything if youre not careful with your data and with your calculations. SPSS watches the performance of the calculations for you, but the raw data, and which calculations should be performed, is up to you.
SPSS works with numbers only not the words. If you cannot express your information as a number, you cant run it through SPSS. You will see names and descriptions seemingly being processed by SPSS, but thats because each name has been assigned a number. For Example: If a survey questions is like, How much do you enjoy eating burger? If you have to select your answer from: Very much, sort of, not really, hate the stuff. A number (code) is assigned to each of the possible answers, and these numbers are fed through the statistical process. You must keep accurate records describing your data, how you got the data, and what it means. SPSS can do all the calculations for you, but only you can decode/read/interpret what it means. I dentify the variables. You always begin by defining a set of variables, and then you enter data for the variables to create a number of cases. For example: if you are doing an analysis of automobiles, each car in your study would be a case. The variables that define the cases could be things such as the year of manufacture, horsepower, and cubic inches of displacement. Each car in the study is defined as a single case, and each case is defined as a set of values assigned to the collection of variables. Every case has a value for each variable. (Well, you can have missing values too)
Variables have types. That is, each variable is defined as containing a specific kind of number. For example: A scale variable is a numeric measurement, for example, weight, Salary, Income or miles per gallon. A categorical variable contains values that define a category; for example, a variable named gender could be a categorical variable defined to contain only values 1 for female and 2 for male. Things that make sense for one type of variable dont necessarily make sense for another. For example, it makes sense to calculate the average miles per gallon, but not the average gender.
Make sense with data. You can instruct SPSS to do any analysis, draw graphs and charts. When preparing SPSS to run an analysis or draw a graph, the OK button is unavailable until you have made all the choices necessary to produce output. Not only does SPSS require that you select a sufficient number of variables to produce output, it also requires that you choose the right kinds of variables. Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 3
For Example: If a categorical variable is required for a certain slot, SPSS will not allow you to choose any other kind. Whether the output makes sense is up to you and your data, but SPSS makes certain that the choices you make can be used to produce some kind of result.
Obtaining output is necessary. Getting an output is not that important from any type of data, but learning the skills of interpreting right output for right data is all that matters. Keep a multiple copies of your data set. The most valuable possession you have in dealing with statistics is not your computer. Its not your SPSS software. You can lose any one of those, but any one of them can be replaced. Your most valuable possession is your data. Sure, you can always go and get more data, but you cant go and get the same data. The world doesnt hold still long enough. Make sure you make backup copies of your data. ____________________________________________________________________________________ STARTING SPSS The SPSS program can be installed in a computer using a CD or from the network.
Once installed, SPSS can be opened like any other Windows-based application by clicking on the Start menu at the bottom left hand corner of the screen and clicking on SPSS for Windows from the list of programs. Opening the SPSS program for the first time will produce a dialog box as shown in the following figure.
This dialog box is not of any particular use, select Dont show this dialog in the future, and click on the Cancel button. This activates a window as shown in the figure below in the next page.
Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 4
This is the main data editor window where all the data is entered, is much like an Excel spreadsheet. At the top of the screen (figure above), there are different menus, which give access to various functions of SPSS. At the bottom of the screen, we have a status bar. At the bottom of figure, we can see SPSS Processor is ready in the status bar. It implies that SPSS has been installed properly and the license is valid. The program can be closed by clicking on the Close button at the top right hand corner, just like in any other Windows application software.
READING DATA FROM EXCEL FILES Suppose we want to import an excel file into SPSS. Go for FILEOpenData
First, open the excel file and understand how it is formatted. The first row has variable names, and the data part is from the second row and below. Close the excel file and lets start reading this file into SPSS. This brings up a dialogue box Open Data as shown below. Firstly select the Files of the type as Excel (*.xls). Now read the required file from Look In. Then select the excel file xls_gss93.xls (for ex) you saved in your system. Then click Open. (See below for a visualized instruction).
Now you should be seeing another dialogue box Opening Excel Data Source. Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 5
As we first checked, the excel file has variable names in the first row. So check the Read variable names from the first row of the data box. Click OK. Now you have a new, unsaved data in another SPSS Data Editor window. To save the data in the SPSS format, go from the pull-down menu: File Save Lets save it in your working directory with the name xls_gss93. (Say).
OPENING AN SPSS DATA FILE:
Alternatively, you can use the Open File button on the toolbar. A dialog box for opening files is displayed. By default, SPSS-format data files (.sav extension) are displayed. SPSS MAIN MENUS:
The File, Edit, and View menus are very similar to what we get on opening a spreadsheet. Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 6
The File menu lets us open, save, print, and close files and provides access to recently used files. The Edit menu lets us do things like cut, copy, paste etc. The View menu lets us customize the SPSS desktop. Using the View menu we can hide or show the toolbar, status bar, gridlines etc. The Data menu is an important tool in SPSS. It allows us to manipulate the data in various ways. We can define variables, go to a particular case, sort cases, transpose them, merge cases as well as variables from some other file. We can also select cases on which we want to run the analysis and split the file to arrange the output of the analysis in a particular manner. The Transform menu is another very useful tool, which lets us compute new variables and make changes to existing ones. The Analyze menu is the function which lets us perform all the statistical analyses. This has various statistical tools categorized under different categories. The Graphs menu lets us make various types of plots from our data. The Utilities menu gives us information about variables and files. The Add-ons tells us about other programs of the SPSS family such as Amos, Clementine etc. In addition, we can find the newly added functions under Add-ons. The Window and Help menus are very similar to other Windows application menus. THE SPSS WINDOWS AND FILES SPSS Statistics has three main windows, plus a menu bar at the top. These allow you to (1) see your data, (2) see your statistical output, and (3) see any programming commands you have written. Each window corresponds to a separate type of SPSS file. Data Editor (.sav files) & Output Viewer (.spv files) 1) SPSS DATA EDITOR(.sav files) The Data Editor provides a convenient, spreadsheet-like method for creating and editing data files. The Data Editor window opens automatically when you start a session. The Data Editor provides two views of your data: Data View. This view displays the actual data values or defined value labels. Variable View. This view displays variable definition information, including defined variable and value labels, data type (for example, string, date, or numeric), measurement level (nominal, ordinal, or scale), and user-defined missing values. In both views, you can add, change, and delete information that is contained in the data file.
Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 7
The data file is displayed in the Data Editor. (File Name: demo.sav) The Data Editor lets you see and manipulate your data. You will always have at least one Data Editor Open (even if you have not yet opened a data set). When you open an SPSS data file, what you see, is a working copy of your data. Changes you make to your data are not permanent until you save them (click File, Save or Save As). Data files are saved with a file type of .sav, a file type that most other software cannot work with. When you close your last Data Editor you are shutting down SPSS and you will be prompted to save all unsaved files. In SPSS 13.0 and earlier versions, one could open only one data editor window at a time, however from SPSS 14.0 and later versions, multiple data editor windows can be opened simultaneously, much like Microsoft Excel. At the bottom of the data editor there are two tabsData View and Variable View. In Data View, the data editor works pretty much in the same manner as an Excel spreadsheet. One can enter values in different cells, modify them and even cut and paste to and from an Excel spreadsheet.
In Variable View, the data editor window looks as shown in figure below.
Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 8
In addition to entering the values of the variables, we have to provide information about them in SPSS. This can be done when the data editor is in Variable View. Notice that there are 10 columns in the data editor window in Figure above. 1) NAME: VARIABLES WITHOUT ANY SPACE (USE UNDERSCORE) 2) TYPE: VARIABLES CAN BE EITHER NUMERIC OR STRING (CANNOT PERFORM STATISTICAL ANALYSIS) 3) WIDTH: WIDTH SHOULD BE BASED ON THE NUMBER 4) DECIMALS: BY DEFAULT YOU CAN KEEP IT AS 2 5) LABEL: COMPLETE DESCRIPTION OF NAME 6) VALUES: USED FOR CODED DATA. 7) MISSING: NO MISSING VALUES. 8) COLUMNS: BY DEFAULT, WE DO NOT USE IT. 9) ALIGN: RIGHT/LEFT OR JUSTIFIED. 10) MEASURE: NOMINAL/ORDINAL/INTERVAL
We will explain the usage of each of them with the help of following small exercise of data entry: Suppose we want to enter the following data in SPSS:
We observe that we have three variables to enterrespondent number, gender, and age. 1) The first column in the variable view is Name. Earlier versions of SPSS (SPSS 12.0 and earlier) could take a maximum of eight characters starting with a letter to identify a variable. There is no limit for the length of variable name in the later versions. In this example, we will name respondent number as resp_id; gender and age can be named as they are. 2) The next column titled Type
This lets us define the variable type. If we click on the cell next to variable name and in the Type column, we get a dialog box as shown in figure.
Data can be of several types, including numeric, date, text etc.
An incorrect type-definition may not always cause problems, but sometimes does, and should therefore be avoided. The most common type used is Numeric, which means that the variable has a numeric value. The other common choice is Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 9
String, which means that the variable is in text format. We cannot perform any statistical analysis on a numeric variable if it is specified as a string variable. Since all our three variables are of the numeric type, we select numeric from the dialog box shown in figure above. We can also specify the width of the variable and decimal places on this dialog box. It only affects the way variables are shown when the data editor is on data view. Click on OK to return to the data editor.
3) Next two columns titled Width and Decimals This also allows us to specify these factors for the data view. Please note that these have no impact on the actual values we enter in the data editor, they only affect the display of the data. For example if the value of a variable in a particular cell is 100000000, which comprises of 9 digits and we have specified the width for this variable as 8, it will appear as ########. This simply means that the width of the variable column is not enough to display the variable correctly. 4) Next, we have a column titled Label. Since the variable name in the first column can only be of 8 characters in the earlier versions of the SPSS program, it is sometimes difficult to identify the variable by its name. To avoid this problem, we can write the details about a particular variable in this column. For example, we can write Respondent identification number as the label for resp_id variable. We can ask the SPSS program to show variable labels with or without the names in the output window. This option can be activated by selecting Names and Labels from the dialog box obtained by clicking Edit Options Output Labels. 5) Then, we have a column labeled Values. If we click on the cell next to the variable name and in the Values column, we get a dialog box as shown in Figure below. In this box, we can specify values for our variables.
In the example here, we have two values for a variable gender as 1 representing Male and 2 representing Female. Now enter 1 in the empty box labeled Value and specify its name (Male) in the next box labeled Value Label. This will activate the Add button. Click on this button and repeat these steps to specify female. This way we can keep track of the actual status of qualitative variables such as gender, nation, race, color etc.
6) After Values we have a column labeled Missing This is to specify missing values. While coding data, we often specify certain numbers to variables for which some respondents have given no response. Unless we specify these values as missing values, SPSS will take them into consideration for data analyses producing a wrong output. One way to handle this problem is to recode these numbers to missing values. The other way is to specify the number that should be considered as missing values here itself. Clicking on the cell next to the variable name and in the Missing column will produce a dialog box as shown in Figure below. Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 10
By default, No missing value is selected here. We can specify up to three discrete values to be considered as missing values. Alternatively, specify a range and all the values in the range will be considered as missing values.
7) The next two columns titled Columns and Align This helps us modify the way we want to view the data on screen. In the Columns column we can specify the width of the column and in the Align column we can specify if we want our data to be right, left or center aligned. These do not have any impact on the actual data analyses.
8) Finally, in the column titled Measure, We can specify whether our variable is scale, ordinal, or nominal. SPSS treats interval and ratio data as scale.
Once the variables are specified, you can switch to Data View and enter the data. This data file can be saved just as an MS Word or MS Excel file and reopened by double clicking on the file from its saved location. VARIABLE MEASUREMENT LEVEL: Nominal. A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for example, the department of the company in which an employee works). Examples of nominal variables include region, zip code, and religious affiliation. Ordinal. A variable can be treated as ordinal when its values represent categories with some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied to highly satisfied). Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores. Scale. A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. Examples of scale variables include age in years and income in thousands of dollars.
Note: For ordinal string variables, the alphabetic order of string values is assumed to reflect the true order of the categories. For example, for a string variable with the values of low, medium, high, the order of the categories is interpreted as high, low, medium, which is not in the correct order. In general, it is more reliable to use numeric codes to represent ordinal data.
ENTERING DATA
In Data View, you can enter data directly in the Data Editor. You can enter data in any order. You can enter data by case or by variable, for selected areas or for individual cells. To enter anything other than simple numeric data, you must define the variable type first. If you enter a value in an empty column, the Data Editor automatically creates a new variable and assigns a variable name. Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 11
SPSS-format data files are organized by cases (rows) and variables (columns).
In this data file, cases (rows in data view) represent individual respondents to a survey.
Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 12
Variables (columns of data view) represent respondents responses to each question asked in the survey.
2) SPSS VIEWER [Output Viewer (.spv files)] SPSS Viewer is opened automatically to show the output when you run SPSS commands. The Viewer window has his own menu and toolbars. The window itself is divided into two parts: The left-hand side shows a tree structured outline (list) of the output elements shown in the right hand part, i.e. a structured table of contents. Each SPSS command produces a set of output elements (called frames) that are grouped hierarchically; a yellow icon in the outline identifies the command and a level below you will find the various frames it has produced. All statistical commands will produce a Title, a Notes (technical notes on the current command) frame and an Active Dataset frame (Contains the name of the dataset used). Most commands add a frame about the number of observations included into analysis (often labeled Case processing summary), in the example here Statistics, as well as the specific frames for the command, i.e. results, tables, graphs are shown.
The Output Viewer shows you tables of statistical output and any graphs you create. By default it also shows you the syntax. The Output Viewer also allows you to edit and print your results. The contents of the Output Viewer are saved (click File, Save or Save As) with a file type of .spv, which can only be opened with SPSS software. As with Data Editors, it is possible to open more than one Output Viewer to look at more than one output file. The active Viewer, marked with a tiny blue plus sign, will receive the results of any commands that you issue. If you close all the Output Viewers and then issue a new command, a fresh Output Viewer is started. Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on Applications of SPSS in Research Data Analysis, 4 th March, 2014. Page 13
EXPORTING OUTPUT To export your output, you go through a special procedure. In the Output Viewer click File, Export to invoke the Export dialog box. There are three main settings to look at. First, pick the type of file to which you want to export: useful file types include Excel, PDF, PowerPoint, or Word. Next, check that you are exporting as much of your output as you want, the Objects to Export at the top of the dialog. If you have a part of your output selected, this option will default to exporting just your selection, otherwise you typically will export all your visible output. Finally, change the default file name to something meaningful, and save your file to a location where you will be able to keep it, like your U:\ drive. Once your options are set, click OK.