Topic 1 Data Management in R EDUC 216

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

EDUC 216 DATA MANAGEMENT IN R

GETTING STARTED WITH R

What is R?

R is a free software environment for statistical computing and graphics. It is supported by the
R Foundation for Statistical Computing.

R is a GNU package and is available under the GNU General Public License, which can be
assumed to be free to a certain extent and is open source.

The R – chitecture

R exists as base package with a reasonable amount of functionality. The Software R and its
packages are stored in a central location known as the CRAN or the Comprehensive R Archive
Network. Once a package is stored in the CRAN, anyone with an internet connection can download
it from the CRAN and install it to use within their own copy of R.

Pros and Cons of R

Advantages
-free
-versatile
-rapidly expanding tool and can respond quickly to new developments
Disadvantages
-ease of use (typing instructions rather than pointing, clicking, and dragging things with
a mouse.
-work with a command line rather than a graphical user interface)

Downloading and Installing R

To install R onto your computer you need to visit the project website (http://www.R-project.org).
The figure below shows the process of obtaining the installation files. On the main project page,
on the left-hand side, click on the link labelled ‘ CRAN’

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

There are various copies (mirrors) of CRAN across the globe; therefore the link to the CRAN will
navigate you to page of links to the various ‘mirror’ sites. Scroll down this list to find a mirror
near to you.

Once you have been redirected to the CRAN mirror that you selected, you will see a web page
that asks you which platform you use (Linux , MacOS or Windows). Click the link that applies
to you.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

If you click on the ‘Windows’ link, then you’ll be taken to another page with some more links;
click on’ base’, which will direct you to the webpage with the link to the setup file, once there,
click on the link that says ‘Download R ___ for Windows’, which will initiate the download of the
R setup file. Once this file has been downloaded, double click on it and you will enter a (hopefully)
familiar install procedure.

If you click on the ‘MacOS’ link you will be taken directly to a page from where you can download
the install package by clicking on the link labelled ‘R-__.pkg’ Clicking this link will download the
install file; once downloaded, double click on it and you will enter the normal MAcOS install
procedure.

RSTUDIO: The Integrated Development Environment (IDE )for R

An IDE is a software application that helps programmers develop software more easily and more
productively. An IDE is made of a code editor, compiler and debugger tools.

RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-


highlighting editor that supports direct code execution, as well as tools for plotting, history,
debugging and workspace management. It makes R easier to use.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

To Install RStudio
1. Go to www.rstudio.com and click on the "Download RStudio" button.
2. Click on "Download RStudio Desktop."
3. For Windows : Click on the version recommended for your system, or the latest
Windows version, and save the executable file. Run the .exe file and follow the
installation instructions.
4. For MacOS: Click on the version recommended for your system, or the latest Mac
version, save the .dmg file on your computer, double-click it to open, and then
drag and drop it to your applications folder.

Environement/History

Editor/Script/
Data Pane

Files/Plots/
Packages/Help

Console

The Main Windows in R Studio

Code Editor/Source – a separate window where you can write your commands rather than
writing directly to the console. Here, you can enter multiple lines of code, save your script file to
disk, and perform other tasks on your script.
It’s Smart – it recognizes and highlights various elements of the code; it helps you find matching
brackets in your scripts.

Console – It is the main window where you can both type commands and see the results of
executing these commands. This is where you do all the interactive work with R.

Files, Plots, Package, and Help – File. This is where you can browse the folders and files on
your computer. Plots. This is where R displays your plots. Packages. This is where you can
view a list of all the installed packages. Help. This is where you can browse the built-in Help
system of R.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

Menus in R for Windows

File – It allows you to do general things such as saving workspace. Likewise, you can open
previously saved files and print graphs, data or output. In essence, it contains all the options that
are customarily found in File menus.

Edit – This menu contains edit functions such as cut and paste. From here, you can also clear the
console, activate a rudimentary data editor, and change how the Graphical User Interface looks.

View – This menu lets you select whether or not to see the toolbar and whether to show a status
bar at the bottom of the window.

Misc – This menu contains options to stop ongoing computations, to list any objects in your
working environment, and also to select whether R autocompletes words and filenames for you.

Packages – This menu is very important because it is where you load, install and update packages.

Window – If you have multiple windows. This menu allows you to change how the windows in R
are arranged.

Help – It routes you to online help (links to frequently asked questions, the R webpage etc.) and it
offers you an offline help (pdf manuals and system help files).

Resize – This menu is for resizing the image in the graphics window so that it is a fixed size, it is
scaled to fit the window but retains its aspects ratio (fit to window), or it expands to fit the window
but does not maintain it aspects ratio.

Commands, Objects and Functions

Everything you want to do has to be typed into the console.

Commands in R are generally made up of two parts: objects and functions. These are separate by
“< −“, which you can think of as meaning ‘is created from’. As such, the general from of
command is: object <- function which means ‘object is created from function’.

An object is anything created in R. It could be a variable, a collection of variables, a statistical


model, etc.

Functions are the things that you do in R to create your objects.

R is case sensitive; which means that if the same things are written in upper or lower case, R
thinks that they are completely different things.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

Installing Packages

Package is self-contained set of code that adds functionality to R, similar to the way that an add-
ins adds functionality to Microsoft excel.

Package does not come pre-installed in R.

Two Ways to Install Packages


1. Through menus.
2. Using a command.

In windows if you select Packages => Install packages(s)… the window that will open first asks
you to select a CRAN and then choose a package you want to install.

If you know the package you want to install, then the simplest way to execute this command is
install.packages (“package.name”) in which ‘package.name’ is replaced by the name of the
package that you’d like to installed. Note that the name of the package must be enclosed in speech
marks.

Once a package is installed you need to reference it for R to know that you’re using it. You need
to install the package only once but once you need to reference it each time you start a new session
of R.

To reference a package, we simply execute this general command: library(package.name)

R Workspace
The collection of objects and things you have created in a session is known as your workspace.

Before you look into importing data into the R console, you must determine your workplace or
work directory first. You should always set the current workspace or work directory.

Setting a Working Directory

A working directory is a directory where you want to store your data files.

To set the working directory to this folder, we use the setwd( ) command to specify this newly
created folder as the working directory.

- Create a folder and place the data files you’ll be using in that folder.
Example: setwd(“D:/R Training/Files”)

By executing this command, we can now access files in that folder directly without having
reference to the full file path.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

If you want to check what working directory is, we have to execute the command getwd( ).

GETTING DATA INTO R

A< - 1
B<- 2

Here A is a variable, and B is variable


<- means assign
A<_1 means variable A is assigned a value of 1.

Creating Variables

Use the c( ) function to create objects that contain data.

Example:
R_name<-c(“Renan”, “John”, “Chlea”,”Jean”)
Province<- c(“Bohol”, “Cebu”, “Negros Oriental”,” “Siquijor”)
Age<-c(28,32,27,30)

The quotes tell R that the data are not numeric.

Variables that consist of data that are text are known as string variables. Variables that contain
data that are numbers are known as numeric variables.

Creating a Date Variable


We can convert dates written as text into date objects using the as. Date () function. This
function takes strings of text, and converts them into dates.

Example:
Birthdate<- as. Date (c (“1990-06-21”,”1986-07-16”, “1991-09-08”,”1988-05-24”))

Creating Dataframes

We can think of a dataframe as a spreadsheet. It is an object containing variables.

If we want to combine R_name,Province, Birthdate, and Age and create dataframe, we can use the
data.frame() function.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

Example:
Profile<-data.frame(R_name,Province,B_date, Age)
In this command, we create a new object ( called Profile) . As such, our dataframe consists of four
variables (names of the respondents, their provinces, birthdates and ages).

Now that the dataframe has been created we can refer to these variables at any point using the
general form:
dataframe$variableName

Check:
Profile$Province

Profile$ Age

Dataframes are not the only way to combine variables in R. You can also use the list( ) and cbind()
functions to combine variables.

The list( ) creates a list of separate objects; you can imagine it as though your handbag (or manbag)
but nicely organized. Your handbag contains lots of different objects: wallet, phone, iPod, pen,
etc. Those objects can be different but this doesn’t stop them from being collected into the same
bag. The list( ) function creates a sort of bag into which you can place objects that you have created
in R.

Profilel<-list(Province, Age)

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

The function cbind( ) is used simply for pasting columns of data together (you can also use rbind()
to combine rows of data together.

Profile2<-cbind(Province, Age)

Notice that the numbers are in quotes; this is because the variable containing provinces is text,
so it causes the ages to be text as well. For this reason, cbind() is most useful for combining
variables of the same type.

CREATING CODING VARIABLES/FACTORS

A coding variable (also known as a grouping variable or factor) is a variable that uses numbers to
different groups of data. As such, it is a numeric variable, but these numbers represent names
(i.e., it is a nominal variable).

First we can enter the data and then worry about turning these data into coding variable.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

Example
sex<-c(0,0,1,1)

In situations like this, in which all cases in the same group are grouped together in the data file,
we could do the same thing more quickly using the rep() function. This function takes the
general form of rep (number to repeat, how many repetitions).

sex1 <- c (rep (0.2), rep (1.2))

To turn this variable into a factor, we use the factor () function. This function takes the general
form:
factor (variable, levels = c(x,y…,z), labels =c(label1”,”label2”,…”label3”)).

levels=c(1,2,3,4,…) denotes which values we used to denote different groups


labels=c(“label”,…) assigns labels to these levels.

If we have used regular series such as 1, 2,3,4 we can abbreviate this as c(1:4) where the colon
simply means ‘all the values between; so c(1:4) is the same as c(1,2,3,4).

Example: sex<- factor (sex1, levels = c(1,2), labels =c (“male”, “female”)

Missing Values

Missing data can occur for a variety of reasons: in long questionnaires participants accidentally
miss out questions; in experimental procedures mechanical faults can lead to a datum not being
recorded; and in research on delicate topics(e.g., sexual behavior) participants may exert their right
not to answer a question.

In R the code used is NA (in capital letters) which stands for (“not available”)

Reading Data Files

Reading A CSV File

You can read the data CSV file using the read.csv function:
> data <- read.csv(file=”data.csv”, header=TRUE);

Reading an Excel File


To read an Excel file, you need to use the xlsx package. To install the xlsx package, type
> install.packages(“xlsx”);
To use the xlsx package, use the require ( ) function:

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

> require(“xlsx’);
Loading required package:xlsx

> data<- read.xlsx(file =”data.xlsx”,1);

file is the location of the Excel file


1 refers to sheet number 1

Reading an SPSS File

To read an SPSS file, you need to use the foreign package.

You can install the foreign package using the install.packages () function:

> install.packages (“foreign”)

To use the foreign package, use the require () function:


> require(foreign);

To read the SPSS file to a data frame type, use the read.spss () function:
Data<-read.spss(file=”data.spss”, to.data.frame=TRUE);

file is the file path or location to read the SPSS file


to.data.frame is a logical value to read the SPSS file to a data frame type

Entering Data with R Commander

It is possible to do some basic data editing (and analysis) using a package called Rcmdr (short
for R Commander). This package loads a window style interface for basic data manipulation.

To install Rcmdr, execute:


install.packages (“Rcmdr”, dependencies = TRUE)

To load package, execute:


library (Rcmdr)

Creating Variables and Entering Data with R Commander

R Commander offers a basic spreadsheet style interface for entering data (i.e. like Excel)
To create a new dataframe, select Data => New data set… which opens a dialog box that enables
you to name the dataframe.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

We will create dataframe the same as our previous example

Creating Coding Variables with R Commander

To convert a string variable to a factor or coding variable, select Data => Manage variables in
active data set => Convert numeric variables to factors …

Select the variable that you want to convert. If you want to type some labels for the levels of
your coding variable, then select “Supply level names” and click on .ok.

Ivy Corazon A. Mangaya-ay,PhD [email protected]


EDUC 216 DATA MANAGEMENT IN R

Importing Data with R Commander

To activate a submenu that enables you to open a text file, SPSS, Stata or Excel file, select Data
=> Import Data.

Ivy Corazon A. Mangaya-ay,PhD [email protected]

You might also like