R Programming

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 77

Hadoop & R

Programming
Hadoop & R Programmming

Strings, Vectors, List, Arrays


Hadoop & R Programmming

Strings
Any value written within a pair of single quote or double quotes in R is treated as a string.
Internally R stores every string within double quotes, even when you create them with a single
quote.

Rules Applied in String Construction

•The quotes at the beginning and end of a string should be both double quotes or both single
quotes. They can not be mixed.

•Double quotes can be inserted into a string starting and ending with a single quote.

•Single quotes can be inserted into a string starting and ending with double quotes.

•Double quotes can not be inserted into a string starting and ending with double quotes.

•Single quote can not be inserted into a string starting and ending with a single quote.
Hadoop & R Programmming

Strings
Examples of Valid Strings

Following examples clarify the rules about creating a string in R.


a <- 'Start and end with single quote'
print(a)
b <- "Start and end with double quotes"
print(b)
c <- "single quote ' in between’ double quotes"
print(c)
d <- 'Double quotes " in between” single quote'
print(d)
Hadoop & R Programmming

Strings
Examples of Invalid Strings

Following examples clarify the rules about creating a string in R.


e <- 'Mixed quotes"
print(e)
f <- 'Single quote ' inside single quote'
print(f)
g <- "Double quotes " inside double quotes"
print(g)
Output
Error: unexpected symbol in:
"print(e)
f <- 'Single"
Execution halted
Hadoop & R Programmming

Strings
String Manipulation
Concatenating Strings - paste() function
Many strings in R are combined using the paste() function. It can take any number of
arguments to be combined together.
Syntax
The basic syntax for paste function is −
paste(..., sep = " ", collapse = NULL)
Following is the description of the parameters used −
1. ... represents any number of arguments to be combined.
2. sep represents any separator between the arguments. It is optional.
3. collapse is used to eliminate the space in between two strings. But not the space within two
words of one string.
Hadoop & R Programmming

Strings
String Manipulation
Program:
a <- "Hello"
b <- 'How'
c <- "are you? "
print(paste(a,b,c))
print(paste(a,b,c, sep = "-"))
print(paste(a,b,c, sep = "", collapse = ""))
Program output:
[1] "Hello How are you? "
[1] "Hello-How-are you? "
[1] "HelloHoware you? "
Hadoop & R Programmming

Strings
Formatting numbers & strings - format() function
Numbers and strings can be formatted to a specific style using format() function.
Syntax
The basic syntax for format function is −
format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none"))
Following is the description of the parameters used −
x is the vector input.
digits is the total number of digits displayed.
nsmall is the minimum number of digits to the right of the decimal point.
scientific is set to TRUE to display scientific notation.
width indicates the minimum width to be displayed by padding blanks in the beginning.
justify is the display of the string to left, right or center.
Hadoop & R Programmming

Strings
Formatting numbers & strings - format() function
# Total number of digits displayed. Last digit rounded off.
result <- format(23.123456789, digits = 9)
print(result)
# Display numbers in scientific notation.
result <- format(c(6, 13.14521), scientific = TRUE)
print(result)
# The minimum number of digits to the right of the decimal point.
result <- format(23.47, nsmall = 5)
print(result)
# Format treats everything as a string.
result <- format(6)
print(result)
Hadoop & R Programmming

Strings
Formatting numbers & strings - format() function
# Numbers are padded with blank in the beginning for width.
result <- format(13.7, width = 6)
print(result)
# Left justify strings.
result <- format("Hello", width = 8, justify = "l")
print(result)
# Justify string with center.
result <- format("Hello", width = 8, justify = "c")
print(result)
Hadoop & R Programmming

Strings
Formatting numbers & strings - format() function

[1] "23.1234568"

[1] "6.000000e+00" "1.314521e+01"

[1] "23.47000"

[1] "6"

[1] " 13.7"

[1] "Hello "

[1] " Hello "


Hadoop & R Programmming

Strings
4. Changing the case - toupper() & tolower() functions

These functions change the case of characters of a string.

Syntax

The basic syntax for toupper() & tolower() function is −

toupper(x)

tolower(x)

Following is the description of the parameters used −

x is the vector input.


Hadoop & R Programmming

Strings
4. Changing the case - toupper() & tolower() functions
Example

# Changing to Upper case.

result <- toupper("Changing To Upper")

print(result)

# Changing to lower case.

result <- tolower("Changing To Lower")

print(result)

1] "CHANGING TO UPPER"

[1] "changing to lower"


Hadoop & R Programmming

Strings
5. Extracting parts of a string - substring() function

This function extracts parts of a String.

Syntax

The basic syntax for substring() function is −

substring(x,first,last)

Following is the description of the parameters used −

x is the character vector input.

first is the position of the first character to be extracted.

last is the position of the last character to be extracted.


Hadoop & R Programmming

Strings
5. Extracting parts of a string - substring() function

Example:

•# Extract characters from 5th to 7th position.


•result <- substring("Extract", 5, 7)

print(result)

[1] "act"
Hadoop & R Programmming

Vectors?
A. Vectors are the most basic R data objects and there are six types of atomic vectors. They are
logical, integer, double, complex, character and raw.

Vector Creation

Single Element Vector

Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of
the above vector types.
# Atomic vector of type character.
print("abc");
# Atomic vector of type double.
print(12.5)
# Atomic vector of type integer.
print(63L)
Hadoop & R Programmming

Vectors?
# Atomic vector of type logical.
print(TRUE)
# Atomic vector of type complex.
print(2+3i)
# Atomic vector of type raw.
print(charToRaw('hello’))
When we execute the above code, it produces the following result −
•[1] "abc"
•[1] 12.5
•[1] 63
•[1] TRUE
•[1] 2+3i
•[1] 68 65 6c 6c 6f
Hadoop & R Programmming

Vectors?
Multiple Elements Vector
Using colon operator with numeric data
# Creating a sequence from 5 to 13.
v <- 5:13
print(v)
# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)
# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)
•[1] 5 6 7 8 9 10 11 12 13
•[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6
•[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8
Hadoop & R Programmming

Vectors?
Multiple Elements Vector

Using sequence (Seq.) operator

# Create a vector with elements from 5 to 9 incrementing by 0.4.

print(seq(5, 9, by = 0.4))

When we execute the above code, it produces the following result −

[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
Hadoop & R Programmming

Vectors?
Multiple Elements Vector

Using the c() function

The non-character values are coerced to character type if one of the elements is a character.

# The logical and numeric values are converted to characters.

s <- c('apple','red',5,TRUE)

print(s)

When we execute the above code, it produces the following result −

[1] "apple" "red" "5" "TRUE"


Hadoop & R Programmming

Vectors
Accessing Vector Elements

•Elements of a Vector are accessed using indexing.

•The [ ] brackets are used for indexing. Indexing starts with position 1.

•Giving a negative value in the index drops that element from result.

•TRUE, FALSE or 0 and 1 can also be used for indexing.


# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)
Hadoop & R Programmming

Vectors
Accessing Vector Elements
# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)
# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)
When we execute the above code, it produces the following result −
[1] "Mon" "Tue" "Fri"
[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"
Hadoop & R Programmming

Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided giving the result as a
vector output.
# Create two vectors.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)
# Vector addition.
add.result <- v1+v2
print(add.result)
# Vector subtraction.
sub.result <- v1-v2
print(sub.result)
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
Hadoop & R Programmming

Vector Manipulation
# Vector division.
divi.result <- v1/v2
print(divi.result)

When we execute the above code, it produces the following result −


[1] 7 19 4 13 1 13
[1] -1 -3 4 -3 -1 9
[1] 12 88 0 40 0 22
[1] 0.7500000 0.7272727 Inf 0.6250000 0.0000000 5.5000000
Hadoop & R Programmming

Vector Manipulation
# Vector Element Recycling
If we apply arithmetic operations to two vectors of unequal length, then the elements of the
shorter vector are recycled to complete the operations.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
# V2 becomes c(4,11,4,11,4,11)
add.result <- v1+v2
print(add.result)
sub.result <- v1-v2
print(sub.result)
When we execute the above code, it produces the following result −
[1] 7 19 8 16 4 22
[1] -1 -3 0 -6 -4 0
Hadoop & R Programmming

Vector Manipulation
# Vector Element Sorting
Elements in a vector can be sorted using the sort() function.
v <- c(3,8,4,5,0,11, -9, 304)
# Sort the elements of the vector.
sort.result <- sort(v)
print(sort.result)
# Sort the elements in the reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
# Sorting character vectors.
v <- c("Red","Blue","yellow","violet")
sort.result <- sort(v)
print(sort.result)
Hadoop & R Programmming

Vector Manipulation
# Vector Element Sorting
# Sorting character vectors in reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)

When we execute the above code, it produces the following result −


[1] -9 0 3 4 5 8 11 304
[1] 304 11 8 5 4 3 0 -9
[1] "Blue" "Red" "violet" "yellow"
[1] "yellow" "violet" "Red" "Blue"
Hadoop & R Programmming

Explain about the list?


Lists are the R objects which contain elements of different types like − numbers, strings, vectors
and another list inside it. A list can also contain a matrix or a function as its elements. List is
created using list() function.

Creating a List

Following is an example to create a list containing strings, numbers, vectors and a logical values
# Create a list containing strings, numbers, vectors and a logical
# values.
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data) .
When we execute the above code, it produces the following result −
Hadoop & R Programmming

Explain about the list?


[[1]]
[1] "Red"
[[2]]
[1] "Green"
[[3]]
[1] 21 32 11
[[4]]
[1] TRUE
[[5]]
[1] 51.23
[[6]]
[1] 119.1
Hadoop & R Programmming

Naming List Elements

The list elements can be given names and they can be accessed using these names.

# Create a list containing a vector, a matrix and a list.

list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3))

# Give names to the elements in the list.

names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.

print(list_data)
Hadoop & R Programmming

Naming List Elements


When we execute the above code, it produces the following result −
•$`1st_Quarter`
•[1] "Jan" "Feb" "Mar"
•$A_Matrix
• [,1] [,2] [,3]
•[1,] 3 5 -2
•[2,] 9 1 8
•$A_Inner_list
•$A_Inner_list[[1]]
•[1] "green"
•$A_Inner_list[[2]]
•[1] 12.3
Hadoop & R Programmming

Accessing List Elements


Elements of the list can be accessed by the index of the element in the list. In case of named
lists it can also be accessed using the names.
We continue to use the list in the above example −
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Access the first element of the list.
print(list_data[1])
# Access the third element. As it is also a list, all its elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$A_Matrix)
Hadoop & R Programmming

Accessing List Elements


When we execute the above code, it produces the following result −
$`1st_Quarter`
[1] "Jan" "Feb" "Mar"
$A_Inner_list
$A_Inner_list[[1]]
[1] "green"
$A_Inner_list[[2]]
[1] 12.3
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8
Hadoop & R Programmming

Manipulating List Elements


We can add, delete and update list elements as shown below. We can add and delete elements
only at the end of a list. But we can update any element.
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Add an element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
# Remove the last element.
list_data[4] <- NULL
# Print the 4th Element.
print(list_data[4])
Hadoop & R Programmming

Manipulating List Elements


# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])
When we execute the above code, it produces the following result −
[[1]]
[1] "New element"
$<NA>
NULL
$`A Inner list`
[1] "updated element"
Hadoop & R Programmming

Manipulating List Elements


Merging Lists
You can merge many lists into one list by placing all the lists inside one list() function.
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
merged.list <- c(list1,list2)
# Print the merged list.
print(merged.list)
Hadoop & R Programmming

Manipulating List Elements


•Merging Lists
When we execute the above code, it produces the following result −
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] "Sun"
[[5]]
[1] "Mon"
[[6]]
[1] "Tue"
Hadoop & R Programmming

Manipulating List Elements

Converting List to Vector

A list can be converted to a vector so that the elements of the vector can be used for further
manipulation. All the arithmetic operations on vectors can be applied after the list is converted
into vectors. To do this conversion, we use the unlist() function. It takes the list as input and
produces a vector.
# Create lists.
list1 <- list(1:5)
print(list1)
list2 <-list(10:14)
print(list2)
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
Hadoop & R Programmming

Manipulating List Elements

Converting List to Vector


print(v1)
print(v2)
# Now add the vectors
result <- v1+v2
print(result)
When we execute the above code, it produces the following result −
[[1]]
[1] 1 2 3 4 5
[[1]]
[1] 10 11 12 13 14
[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19
Hadoop & R Programmming

Matrices
Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular
layout. They contain elements of the same atomic types. Though we can create a matrix
containing only characters or only logical values, they are not of much use. We use matrices
containing numeric elements to be used in mathematical calculations.
A Matrix is created using the matrix() function.
Syntax
The basic syntax for creating a matrix in R is −
matrix(data, nrow, ncol, byrow, dimnames)
Following is the description of the parameters used −
Data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.
Hadoop & R Programmming

Matrices
Create a matrix taking a vector of numbers as input.
# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)
Hadoop & R Programmming

Matrices
When we execute the above code, it produces the following result −
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
[4,] 12 13 14
[,1] [,2] [,3]
[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
Hadoop & R Programmming
Matrices
Accessing Elements of a Matrix

Elements of a matrix can be accessed by using the column and row index of the element. We
consider the matrix P above to find the specific elements below.
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
# Access the element at 3rd column and 1st row.
print(P[1,3])
# Access the element at 2nd column and 4th row.
print(P[4,2])
# Access only the 2nd row.
print(P[2,])
Hadoop & R Programmming
Matrices
Accessing Elements of a Matrix
# Access only the 3rd column.
print(P[,3])
When we execute the above code, it produces the following result −
[1] 5
[1] 13
col1 col2 col3
6 7 8
row1 row2 row3 row4
5 8 11 14
Hadoop & R Programmming
Matrices
Matrix Computations
Various mathematical operations are performed on the matrices using the R operators. The result
of the operation is also a matrix.
The dimensions (number of rows and columns) should be the same for the matrices involved in
the operation.
Matrix Addition & Subtraction
# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
Hadoop & R Programmming
Matrices
Matrix Computations
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
print(result)
# Subtract the matrices
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)
Hadoop & R Programmming
Matrices
Matrix Computations
When we execute the above code, it produces the following result −
[,1] [,2] [,3]
[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of addition
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2
Hadoop & R Programmming
Matrices
Matrix Multiplication & Division
# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
print(result)
# Divide the matrices
result <- matrix1 / matrix2
cat("Result of division","\n")
print(result)
Hadoop & R Programmming
Matrices
When we execute the above code, it produces the following result −
[,1] [,2] [,3]
[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of multiplication
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Result of division
[,1] [,2] [,3]
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000
Hadoop & R Programmming

Data Frames

A data frame is a table or a two-dimensional array-like structure in which each column contains
values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

•The column names should be non-empty.

•The row names should be unique.

•The data stored in a data frame can be of numeric, factor or character type.

•Each column should contain the same number of data items.


Hadoop & R Programmming

Data Frames
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
Hadoop & R Programmming

Data Frames
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
Hadoop & R Programmming

Data Frames
Get the Structure of the Data Frame
The structure of the data frame can be seen by using str() function.
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Get the structure of the data frame.
str(emp.data)
Hadoop & R Programmming

Data Frames
Summary of Data in Data Frame
The statistical summary and nature of the data can be obtained by applying summary()
function.
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the summary.
print(summary(emp.data))
Hadoop & R Programmming

Data Frames
When we execute the above code, it produces the following result

emp_id emp_name salary start_date


Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27
Hadoop & R Programmming

Data Frames
Extract Data from Data Frame
Extract specific columns from a data frame using column name.
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)
Hadoop & R Programmming

Data Frames

When we execute the above code, it produces the following result −

emp.data.emp_name emp.data.salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25
Hadoop & R Programmming

Data Frames
Extract the first two rows and then all columns
Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Extract the first two rows.
result <- emp.data[1:2,]
print(result)
Hadoop & R Programmming

Data Frames
When we execute the above code, it produces the following result −

emp_id emp_name salary start_date


1 1 Rick 623.3 2012-01-01
2 2 Dan 515.2 2013-09-23
Hadoop & R Programmming

Data Frames
Extract 3rd and 5th row with 2nd and 4th column
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)
Hadoop & R Programmming

Data Frames

When we execute the above code, it produces the following result −

emp_name start_date
3 Michelle 2014-11-15
5 Gary 2015-03-27
Hadoop & R Programmming

Data Frames
Expand Data Frame
A data frame can be expanded by adding columns and rows.
Add Column
Just add the column vector using a new column name.
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
Hadoop & R Programmming

Data Frames
# Add the "dept" column.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

When we execute the above code, it produces the following result −


emp_id emp_name salary start_date dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
Hadoop & R Programmming

Data Frames
Add Row
To add more rows permanently to an existing data frame, we need to bring in the new rows in
the same structure as the existing data frame and use the rbind() function.
In the example below we create a data frame with new rows and merge it with the existing data
frame to create the final data frame.
# Create the first data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
dept = c("IT","Operations","IT","HR","Finance"),
stringsAsFactors = FALSE
)
Hadoop & R Programmming

Data Frames
# Create the second data frame
emp.newdata <- data.frame(
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Finance"),
stringsAsFactors = FALSE
)
# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
Hadoop & R Programmming

Data Frames
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
6 6 Rasmi 578.00 2013-05-21 IT
7 7 Pranab 722.50 2013-07-30 Operations
8 8 Tusar 632.80 2014-06-17 Finance
Hadoop & R Programmming

Reshaping?

A. Data Reshaping in R is about changing the way data is organized into rows and columns. Most
of the time data processing in R is done by taking the input data as a data frame. It is easy to
extract data from the rows and columns of a data frame but there are situations when we need
the data frame in a format that is different from the format in which we received it. R has many
functions to split, merge and change the rows to columns and vice-versa in a data frame.

Joining Columns and Rows in a Data Frame

We can join multiple vectors to create a data frame using the cbind() function. Also we can
merge two data frames using rbind() function.
Hadoop & R Programmming
Reshaping?
# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)
# Combine above three vectors into one data frame.
addresses <- cbind(city,state,zipcode)
# Print a header.
cat("# # # # The First data frame\n")
# Print the data frame.
print(addresses)
# Create another data frame with similar columns
new.address <- data.frame(
city = c("Lowry","Charlotte"),
state = c("CO","FL"),
zipcode = c("80230","33949"),
stringsAsFactors = FALSE
)
Hadoop & R Programmming
Reshaping?
# Print a header.
cat("# # # The Second data frame\n")
# Print the data frame.
print(new.address)
# Combine rows form both the data frames.
all.addresses <- rbind(addresses,new.address)
# Print a header.
cat("# # # The combined data frame\n")
# Print the result.
print(all.addresses) )
Hadoop & R Programmming
Reshaping?
# # # # The First data frame
city state zipcode
[1,] "Tampa" "FL" "33602"
[2,] "Seattle" "WA" "98104"
[3,] "Hartford" "CT" "6161"
[4,] "Denver" "CO" "80294"
# # # The Second data frame
city state zipcode
1 Lowry CO 80230
2 Charlotte FL 33949
Hadoop & R Programmming
Explain about Packages?

A. R packages are a collection of R functions, compiled code and sample data. They are stored
under a directory called "library" in the R environment. By default, R installs a set of packages
during installation. More packages are added later, when they are needed for some specific
purpose. When we start the R console, only the default packages are available by default. Other
packages which are already installed have to be loaded explicitly to be used by the R program
that is going to use them.

All the packages available in R language are listed at R Packages.

Below is a list of commands to be used to check, verify and use the R packages.

Check Available R Packages

Get library locations containing R packages

.libPaths()
Hadoop & R Programmming
Explain about Packages?

When we execute the above code, it produces the following result. It may vary depending on the
local settings of your pc.

[2] "C:/Program Files/R/R-3.2.2/library"

Get the list of all the packages installed

library()

When we execute the above code, it produces the following result. It may vary depending on the
local settings of your pc.
Hadoop & R Programmming
Explain about Packages?
Packages in library ‘C:/Program Files/R/R-3.2.2/library’:
 
base The R Base Package
boot Bootstrap Functions (Originally by Angelo Canty
for S)
class Functions for Classification
cluster "Finding Groups in Data": Cluster Analysis
Extended Rousseeuw et al.
codetools Code Analysis Tools for R
compiler The R Compiler Package
datasets The R Datasets Package
foreign Read Data Stored by 'Minitab', 'S', 'SAS',
'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...
graphics The R Graphics Package
grDevices The R Graphics Devices and Support for Colours
and Fonts
Hadoop & R Programmming
Explain about Packages?
grid The Grid Graphics Package
KernSmooth Functions for Kernel Smoothing Supporting Wand
& Jones (1995)
lattice Trellis Graphics for R
MASS Support Functions and Datasets for Venables and
Ripley's MASS
Matrix Sparse and Dense Matrix Classes and Methods
methods Formal Methods and Classes
mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML
Smoothness Estimation
nlme Linear and Nonlinear Mixed Effects Models
nnet Feed-Forward Neural Networks and Multinomial
Log-Linear Models
parallel Support for Parallel computation in R
rpart Recursive Partitioning and Regression Trees
spatial Functions for Kriging and Point Pattern Analysis
Hadoop & R Programmming
Explain about Packages?
splines Regression Spline Functions and Classes
stats The R Stats Package
stats4 Statistical Functions using S4 Classes
survival Survival Analysis
tcltk Tcl/Tk Interface
tools Tools for Package Development
utils The R Utils Package
Hadoop & R Programmming
Explain about Packages?

Install a New Package

There are two ways to add new R packages. One is installing directly from the CRAN directory
and another is downloading the package to your local system and installing it manually.

Install directly from CRAN

The following command gets the packages directly from CRAN webpage and installs the package
in the R environment. You may be prompted to choose the nearest mirror. Choose the one
appropriate to your location.

install.packages("Package Name")

# Install the package named "XML".

install.packages("XML")
Explain about Packages? Hadoop & R Programmming

Install package manually


Go to the link R Packages to download the package needed. Save the package as a .zip file in a
suitable location in the local system.
Now you can run the following command to install this package in the R environment.
install.packages(file_name_with_path, repos = NULL, type = "source")
# Install the package named "XML"
install.packages("E:/XML_3.98-1.3.zip", repos = NULL, type = "source")
Load Package to Library
Before a package can be used in the code, it must be loaded to the current R environment. You
also need to load a package that is already installed previously but not available in the current
environment.
A package is loaded using the following command −
library("package Name", lib.loc = "path to library")
# Load the package named "XML"
install.packages("E:/XML_3.98-1.3.zip", repos = NULL, type = "source")

You might also like