R Language Notes
R Language Notes
R Language Notes
Answer: R can be used as a powerful calculator by entering equations directly at the prompt
in the command console. Simply type arithmetic expression and press ENTER. R will evaluate
the expressions and respond with the result.
1
[1] 64
6. Input:
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v%%t)
7. Input:
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v%/%t)
Operator Precedence in R
2
(4 + 3) ^ 2
[1] 49
Numeric Functions
abs() Absolute value
sqrt() Square root
round(), floor(), ceiling() Rounding, up and down
sum(), prod() Sum and product
log(), log10(), log2() Logarithms
exp() Exponential function
sin(), cos(), tan(), Trigonometric functions
max() Returns maximum value
min() Returns minimum value
3. ceiling(x): It returns the smallest integer which is larger than or equal to x. The ceiling ()
function rounds a number upwards to its nearest integer
Input: x<- 1.4
print(ceiling(x))
Output
[1] 2
4. floor(x): function rounds a number downwards to its nearest integer, and returns the
result.
Input: x<- 1.4
print(floor(x))
Output
[1] 1
3
Output
[1] 1 2 8
11. max (5, 10, 15): the min () and max () functions can be used to find the lowest or highest
number in a set.
Input: max (5, 10, 15) / x<- c (4,5,6); max(x)
Output
[1] 15 / [1] 6
12. min (5, 10, 15): the min () function can be used to find the lowest number in a set.
Input: min (5, 10, 15) / x<- c (4,5,6); min(x)
Output
[1] 5 / [1] 4
13. sum(2,3)
Output
[1] 5
14. prod(2,3)
Output
[1] 6
4
Question 3: What are the different operators in R? Explain different Logical Operators and
Assignment Operators in R.
Answer:
There are following types of operators in R programming −
Arithmetic Operators
Relational Operators
Logical Operators
Assignment Operators
R Logical Operators
Logical operators are used to carry out Boolean operations like AND, OR etc.
Logical Operators in R
Operator Description
! Logical NOT
| Element-wise logical OR
|| Logical OR
Operators & and | perform element-wise operation producing result having length of the
longer operand.
But && and || examines only the first element of the operands resulting into a single length
logical vector.
Zero is considered FALSE and non-zero numbers are taken as TRUE.
Examples for various logical operators in R
> x <- c(TRUE,FALSE,0,6)
> y <- c(FALSE,TRUE,FALSE,TRUE)
1) NOT (!) Operator: It is called Logical NOT operator. Takes each element of the vector and
gives the opposite logical value.
Input: > !x
[1] FALSE TRUE TRUE FALSE
5
2) AND (&) Operator: It is called Element-wise Logical AND operator. It combines each
element of the first vector with the corresponding element of the second vector and gives
an output TRUE if both the elements are TRUE.
Input: > x&y
[1] FALSE FALSE FALSE TRUE
3) Logical AND (&&) Operator: It is called Logical AND operator. Takes first element of both
the vectors and gives the TRUE only if both are TRUE.
Input: > x&&y
[1] FALSE
4) Element wise Logica OR (|): It is called Element-wise Logical OR operator. It combines
each element of the first vector with the corresponding element of the second vector and
gives an output TRUE if one the elements is TRUE.
Input: > x|y
[1] TRUE TRUE FALSE TRUE
5) Logical OR (||) Operator: It is called Logical OR operator. Takes first element of both the
vectors and gives the TRUE if one of them is TRUE.
Input: > x||y
[1] TRUE
Assignment Operators in R
There are various assignment operators in R as follows:
Operator Description
The operators <- and = can be used, almost interchangeably, to assign to variable in the
same environment.
Examples:
Input: > x = 9
6
>x
Output: [1] 9
The <<- operator is used for assigning to variables in the parent environments (more like
global assignments). The rightward assignments, although available are rarely used.
Examples:
c(3,1,TRUE,2) -> v1
c(3,1,TRUE,2) ->> v2
print(v1)
print(v2)
it produces the following result −
[1] 3 1 1 2
[1] 3 1 1 2
Answer: Matrices in R are a bunch of values, either real or complex numbers, arranged in a
group of fixed number of rows and columns. Matrices are used to depict the data in a
structured and well-organized format. It is necessary to enclose the elements of a matrix in
parentheses or brackets.
Example of A matrix with 9 elements is shown below.
This Matrix [M] has 3 rows and 3 columns. Each element of matrix [M] can be referred to by
its row and column number. For example, a23 = 2.
Order of a Matrix: The order of a matrix is defined in terms of its number of rows and
columns. Order of a matrix = No. of rows × No. of columns, Therefore Matrix [M] is a matrix
of order 3 × 3.
Operations on Matrices
7
There are four basic operations i.e. DMAS (Division, Multiplication, Addition, Subtraction)
that can be done with matrices. Both the matrices involved in the operation should have the
same number of rows and columns.
Matrices Addition
The addition of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where every
element is the sum of corresponding elements of the input matrices.
# Creating 1st Matrix
B = matrix(c(1, 2, 5.4, 3, 4, 5), nrow = 2, ncol = 3)
The subtraction of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where every
element is the difference of corresponding elements of the second input matrix from the
first.
print(B - C)
Matrices Multiplication
The multiplication of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where
every element is the product of corresponding elements of the input matrices.
# Creating 1st Matrix
B = matrix(c(1, 2, 5.4), nrow = 1, ncol = 3)
# Creating 2nd Matrix
8
Matrices Division
The division of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where every
element is the quotient of corresponding elements of the first matrix element divided by
the second.
# Creating 1st Matrix
B = matrix(c(4, 6, -1), nrow = 1, ncol = 3)
While using a programming language, different variables are essential to store different
data. These variables are reserved in a memory location for storing values. Once a variable is
created, some area in the memory is reserved.
Data structures are the objects that are manipulated regularly in R. They are used to store
data in an organized fashion to make data manipulation and other data operations more
efficient. R has many data structures. The following section will discuss them in detail.
R has many data structures, which include:
Vector
List
Array
Matrices
Data Frame
Factors
Vector in R
Vector is the easiest data type in R programming language. Vector can contain multiple
elements, but all the elements are of the same data type. Vector has a property called
length, which returns the number of elements in the vector.
R Vector is a sequence of data items of the same data type. Elements in a vector are
officially called components. It can contain an integer, double, character, logical, complex,
or raw data types.
9
For example, character vector can be created as…
vt <- c("Millie", "Noah", "Finn", "Sadie", "Gaten", "Caleb", "Winona", "David", "Natalia")
vt is the vector variable, which is created using character values.
List in R
Lists can be initialized by the command list and can grow dynamically. It is important to
understand that list elements should be accessed by the name of the entry via the dollar
sign or using double brackets. Lists are the R objects which contain elements of different
types like − numbers, strings, vectors and another list inside it. A list can also contain a
matrix or a function as its elements. List is created using list() function.
For example, List can be created as…
R arrays are objects which can store data in more than two dimensions. Uni-dimensional
arrays are called Vectors. A two-dimensional array is called Matrix. More than two-
dimensional objects are called Arrays, which consist of all elements of the same data type.
For example, Arrays can be created as…
A data frame is the standard data structure for storing a data set with rows as observations
and columns as variables. A data frame is conceptually not much different from a matrix and
can either be initialized by reading a data set from an external file or by binding several
column vectors. As an example, consider three variables (age, favourite hobby, and
favourite animal) each with five observations,
Dataframe can be created as….
Input:
age <- c(25,33,30,40,28)
10
hobby <- c("Reading","Sports","Games","Reading","Games")
animal <- c("Elephant", "Giraffe", NA, "Monkey", "Cat")
dat <- data.frame(age,hobby,animal)
vt <- c("Millie", "Noah", "Finn", "Sadie", "Gaten", "Caleb", "Winona", "David", "Natalia")
mtrx <- matrix(vt, nrow = 3, ncol = 3)
Question 6: Explain statistical function in R.
Answer: The various statistical functions in ‘R’ are as follow:
mean(x) Mean of x
median(x) Median of x
var(x) Variance of x
sd(x) Standard deviation of x
scale(x) Standard scores (z-scores) of x
quartile(x) The quartiles of x
summary(x) Summary of x: mean, min, max etc..
cov(a,b,method) Covariance of a and b by spearman method
mean(x)
It is calculated by taking the sum of the values and dividing with the number of values in a
data series. The function mean() is used to calculate this in R.
For example, mean can be calculated in R as….
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
11
median(x)
The middle most value in a data series is called the median. The median() function is used in
R to calculate this value.
For example, median can be calculated in R as….
# Create the vector.
x <- c(2,3,4,5,6)
# Find the median.
median.result <- median(x)
print(median.result)
Standard Deviation, sd(x)
The standard deviation of a population is the square root of the population variance. It is
the measure of the distribution of the values. The higher the standard deviation, the wider
the spread of values. The lower the standard deviation, the closer the spread of values.
The ‘sd’ in R is a built-in function that accepts the input object and computes the standard
deviation of the values provided in the object. The sd() function accepts a numerical vector
and logical arguments and returns the standard deviation.
12
In R programming, we make use of cov() function to calculate the covariance between two
data frames or vectors.
For example, covariance can be calculated in R as….
Input: a <- c(2,4,6,8,10)
b <- c(1,11,3,33,5)
13
Question Bank
Data Science Using R
Module-II
Question 1: What is conditional statements? Explain if and if-else conditional statements
in R.
Answer: There are a lot of situations where we do not just want to execute one statement
after another: in fact, we have to control the flow of execution also. Usually this means that
we merely want to execute some code if a condition is fulfilled. In that case control flow
statements are implemented within R Program.
• If statement
• If-else statement
If control statement in R
14
b=7
if(a<b){
print(paste(a, "is smaller than ", paste(b)))
}
print(paste(a ,"is greater than",paste(b)))
print("The condition (a<b) is not satisfied")
Output:
[1] "8 is greater than 7"
[1] "The condition (a<b) is not satisfied"
Input:
a=5
b=7
if(a<b){
print(paste(a, "is smaller than ", paste(b)))
print("Terminated if condition")
}else{
print(paste(a ,"is greater than",paste(b)))
print("The condition (a<b) is not satisfied")
print("Executed else condition")
}
Output:
[1] "5 is smaller than 7"
[1] "Terminated if condition"
15
be TRUE immediately control jumps to nested if condition and starts evaluating the
condition given inside the if statement. After evaluation, if the result is TRUE or satisfied the
block of codes just below the inner if condition enclosed within curly braces gets executed.
On the other hand, if the inner if condition falls as FALSE the else part code region gets
executed. Till now we discussed when the outer if condition is evaluated to be TRUE. When
the outer if condition is FALSE after its evaluation the control jumps to the else part at the
end section and executes the instruction given inside its curly braces.
For example, the program shows when the outer if condition is satisfied inner if condition
gets executed. In the example, x is assigned a value of 23.
• If (x>20) is evaluated as TRUE control jumps to immediate if condition if (x > 0).
• Evaluates if (x > 0) if it is TRUE print statements inside {} otherwise execute else
region followed by the inner if condition.
• Terminates control structure execution.
Input:
x <-23
if (x > 20) {
print("x is greater than 20")
if (x > 0) {
print("x is greater than 0")
} else {
print("x is negative ")
}
} else {
print("x is zero.")
}
Output
[1] "x is greater than 20"
[1] "x is greater than 0"
Switch control statement in R
The switch control structure is another control statement and is somewhat similar to the
else-if functionality. The R programming language
using switch statements has m number of
options to choose and execute any one of the
options (cases) by validating which case satisfies
the requirement of the provided condition. When
the condition matches with the corresponding
case then that case will only execute. Thus the
switch() statement in R programming language
evaluates a condition against a list of given
elements or values and always returns the first
matching element or value from the list. The
switch control structure in R evaluates the
expression or the condition provided and
16
accordingly selects the arguments such as case1 and case2 inside the parentheses ().
For example,
a <- switch(
4, #condition
#(...... or case1,case2..)
"red", #case1
"blue", #case2
"pink", #case3
"orange" #case4
)
print(a)
Output:
[1] "orange"
The switch evaluates the condition given 4 and the corresponding element in the list of
arguments matching with it is returned. Here condition 4 matches with case4. Let us see the
output displayed when the code is executed.
Question 3: What is loop? Explain ‘For’ loop and if-else function in R.
17
for (value in sequence)
{
statement
Ifelse(condition,TRUE,FALSE)
18
Where,
•condition is the expression to evaluate or test. (parameter 1)
•The TRUE denotes the statement or instruction to execute if condition is satisfied.
(parameter 2)
• The FALSE denotes the statement or instruction to execute if condition is not
satisfied. (parameter 3)
For example: Program to display numbers from 1 to 5 using for loop in R.
Input:
# R program to demonstrate the use of for loop
# using for loop
for (val in 1: 5)
{
# statement
print(val)
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Here, for loop is iterated over a sequence having numbers from 1 to 5. In each iteration,
each item of the sequence is displayed.
Question 4: Explain ‘While’ loop and ‘Repeat’ loop in R.
if( condition )
{
break
}
}
19
Repeat loop Flow Diagram:
To terminate the repeat loop, we use a jump statement that is the break keyword. Below
are some programs to illustrate the use of repeat loops in R programming.
For example, Program to display numbers from 1 to 5 using repeat loop in R.
Input:
# R program to demonstrate the use of repeat loop
val = 1
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
In the above program, the variable val is initialized to 1, then in each iteration of the repeat
loop the value of val is displayed and then it is incremented until it becomes greater than 5.
If the value of val becomes greater than 5 then break statement is used to terminate the
loop.
20
While Loop in R
It is a type of control statement which will run a statement or a set of statements repeatedly
unless the given condition becomes false. It is also an entry controlled loop, in this loop the
test condition is tested first, then the body of the loop is executed, the loop body would not
be executed if the test condition is false.
R – While loop Syntax:
while ( condition )
{
statement
}
While loop Flow Diagram:
Initially, the variable value is initialized to 1. In each iteration of the while loop the condition
is checked and the value of val is displayed and then it is incremented until it becomes 5 and
the condition becomes false, the loop is terminated.
21
Question 5: Describe different character functions in R.
Answer: The different character functions in R as follows:
1. Convert object into character type
The as.character function converts argument to character type. In the example below, we
are storing 25 as a character.
Y = as.character(25)
class(Y)
The class(Y) returns character as 25 is stored as a character in the previous line of code.
3. Concatenate Strings
The paste function is used to join two strings. It is one of the most important string
manipulation task. Every analyst performs it almost daily to structure data.
For example,
x = "Deepanshu"
y ="Bhalla"
paste(x, y)
Output : Deepanshu Bhalla
paste(x, y, sep = ",")
Output : Deepanshu,Bhalla
4. String Formatting
Suppose the value is stored in fraction and you need to convert it to percent. The sprintf is
used to perform C-style string formatting.
The keyword ‘fmt’ denotes string format. The format starts with the symbol % followed by
numbers and letters.
x = 0.25
sprintf("%.0f%%",x*100)
Output : 25%
Note : '%.0f' indicates 'fixed point' decimal notation with 0 decimal. The extra % sign after 'f'
tells R to add percentage sign after the number.
If you change the code to sprintf("%.2f%%",x*100), it would return 25.00%.
22
x = "abcdef"
substr(x, 1, 3)
Output : abc
In the above example. we are telling R to extract string from 1st letter through 3rd letter.
6. String Length
The nchar function is used to compute the length of a character value.
x = "I love R Programming"
nchar(x)
Output : 20
It returns 20 as the vector 'x' contains 20 letters (including 3 spaces).
In many times, we need to change case of a word. For example. convert the case to
uppercase or lowercase.
Examples
x = "I love R Programming"
tolower(x)
Output : "i love r programming"
The str_to_title() function converts first letter in a string to uppercase and the remaining
letters to lowercase.
23
Symbol Meaning Example
%d day as a number (0-31) 01-31
%a abbreviated weekday Mon
%A unabbreviated weekday Monday
%m month (00-12) 00-12
%b abbreviated month Jan
%B unabbreviated month January
%y 2-digit year 07
%Y 4-digit year 2007
Weekday:
The %a, %A, and %u specifiers which give the abbreviated weekday, full weekday, and
numbered weekday starting from Monday.
For example,
# today date
date<-Sys.Date()
# abbreviated month
format(date,format="%a")
# fullmonth
format(date,format="%A")
# weekday
format(date,format="%u")
Output
[1] "Sat"
[1] "Saturday"
[1] "6"
Date:
The day, month, and year format specifiers to represent dates in different formats.
# today date
date<-Sys.Date()
# default format yyyy-mm-dd
24
date
# day in month
format(date,format="%d")
# month in year
format(date,format="%m")
# abbreviated month
format(date,format="%b")
# full month
format(date,format="%B")
# Date
format(date,format="%D")
format(date,format="%d-%b-%y")
Output:
[1] "2022-04-02"
[1] "02"
[1] "04"
[1] "Apr"
[1] "April"
[1] "04/02/22"
[1] "02-Apr-22"
Year:
We can also able to format the year in different forms. %y, %Y, and %C are the few format
specifiers that return the year without century, a year with century, and century of the given
date respectively.
# today date
date<-Sys.Date()
25
# century
format(date,format="%C")
Output
[1] "22"
[1] "2022"
[1] "20"
Question 7: Describe how sorting will be executed in R programming?
Answer: There are various sort functions:
Method-I
sort () function in R Language is used to sort a vector by its values. It takes Boolean value as
argument to sort in ascending or descending order.
Syntax:
sort(x, decreasing, na.last)
Parameters:
x: Vector to be sorted
The major drawback of the sort() function is that it cannot sort data frames.
Method-II
order() function
To overcome the drawback in method 1, we use the order() function, which also sorts data
frames according to the specified column. To sort in decreasing order add negative sign.
26
Data can also be sorted with multiple criteria. Suppose if the age of two persons is the same
then, we can sort them on the basis of their names i.e. lexicographically.
For example,
# define dataframe
df <- data.frame("Age" = c(12, 21, 15, 5, 25),
"Name" = c("Johnny", "Glen", "Alfie",
"Jack", "Finch"))
Output:
Age Name
4 5 Jack
1 12 Johnny
3 15 Alfie
2 21 Glen
5 25 Finch
Method-III
Arrange() is used to sort the dataframe in increasing order, it will also sort the dataframe
based on the column in the dataframe
Syntax:
arrange(dataframe,column)
where
• dataframe is the dataframe input
• column is the column name, based on this column dataframe is sorted
Input:
# load the package
library("dplyr")
27
data = data.frame(rollno = c(1, 5, 4, 2, 3),
names = c("sravan", "bobby", "pinkey", "rohith", "gnanesh"),
subjects = c("java", "python", "php", "sql", "c"))
# sort the data based on subjects
print(arrange(data, subjects))
Output:
rollno names subjects
1 3 gnanesh c
2 1 sravan java
3 4 pinkey php
4 5 bobby python
5 2 rohith sql
28
Question Bank
Data Science Using R
Module-III
Question 1: Explain Factors in R programming.
Answer: Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns which have a
limited number of unique values. Like "Male, "Female" and True, False etc. They are useful
in data analysis for statistical modeling.
Factors are created using the factor () function by taking a vector as input.
Generating Factor Levels: We can generate factor levels by using the gl() function. It takes
two integers as input which indicates how many levels and how many times each level.
Syntax
gl(n, k, labels)
Following is the description of the parameters used −
• n is a integer giving the number of levels.
• k is a integer giving the number of replications.
• labels is a vector of labels for the resulting factor levels.
For Example,
Input:
v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)
Output:
When we execute the above code, it produces the following result −
Tampa Tampa Tampa Tampa Seattle Seattle Seattle Seattle Boston
[10] Boston Boston Boston
Levels: Tampa Seattle Boston
Creating a Factor: The command used to create or modify a factor in R language is – factor()
with a vector as input.
The two steps to creating a factor are:
• Creating a vector
• Converting the vector created into a factor using function factor()
Input:
# Creating a vector
x < -c("female", "male", "male", "female")
print(x)
# Converting the vector x into a factor
# named gender
gender < -factor(x)
print(gender)
29
Output:
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
print() function
print() function is used to print a string and a variable. When we use a variable inside print(),
it prints the value stored inside the variable.
Example: 1 strings with quote
Input:
# text string
my_string <- "programming with data is fun"
print(my_string)
#[1] "programming with data is fun"
Output:
[1] "programming with data is fun"
Input:
my_string <- "programming with data is fun"
# print without quotes
print(my_string, quote = FALSE)
#> [1] programming with data is fun
Output:
[1] programming with data is fun
cat() function
R programming also provides the cat() function to print variables. However, unlike print(),
the cat() function is only used with basic types like logical, integer, character, etc. The cat()
function in R can be used to concatenate together several objects in R
Example 1:
Input:
# print using Cat
30
cat("R Tutorials\n")
# print a variable using Cat
message <- "Programiz"
cat("Welcome to ", message)
Output
R Tutorials
Welcome to Programiz
In the example above, we have used the cat() function to display a string along with a
variable. The \n is used as a newline character.
Input:
#concatenate three strings
cat("hey", "there", "everyone")
Output:
hey there everyone
Input:
# R program for String Creation
# creating a string with double quotes
str1 <- "OK1"
cat ("String 1 is : ", str1)
Output:
String 1 is : OK1
31
Output:
5
Output:
6
Output:
"L"
substr() or substring() function in R extracts substrings out of a string beginning with the
start index and ending with the end index.
Case Conversion
The string characters can be converted to upper or lower case by R’s inbuilt function
toupper() which converts all the characters to upper case, tolower() which converts all the
characters to lower case, and casefold(…, upper=TRUE/FALSE) which converts on the basis
of the value specified to the upper argument. All these functions can take in as arguments
multiple strings too.
Input:
# R program to Convert case of a string
str <- "Hi LeArn CodiNG"
print(toupper(str))
print(tolower(str))
print(casefold(str, upper = TRUE))
Output:
[1] "HI LEARN CODING"
[1] "hi learn coding"
[1] "HI LEARN CODING"
32
Input:
> s = c("aa", "bb", "cc", "dd", "ee")
> s[3]
Output:
[1] "cc"
Unlike other programming languages, the square bracket operator returns more than just
individual members. In fact, the result of the square bracket operator is another vector, and
s[3] is a vector slice containing a single member "cc".
Negative Index
If the index is negative, it would strip the member whose position has the same absolute
value as the negative index. For example, the following creates a vector slice with the third
member removed.
Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[-3]
Output:
[1] "aa" "bb" "dd" "ee"
Out-of-Range Index
If an index is out-of-range, a missing value will be reported via the symbol NA.
Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[10]
Output:
[1] NA
Duplicate Indexes
The index vector allows duplicate values. Hence the following retrieves a member twice in
one operation.
Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[c(2, 3, 3)]
Output:
[1] "bb" "cc" "cc"
Range Index
33
To produce a vector slice between two indexes, we can use the colon operator ":". This can
be convenient for situations involving large vectors.
Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[2:4]
Output:
[1] "bb" "cc" "dd"
Retrieving a vector slice. Here it shows how to retrieve a vector slice containing the second
and third members of a given vector s.
Input:
> s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 3)]
Output:
[1] "bb" "cc"
34
> s[c(FALSE, TRUE, FALSE, TRUE, FALSE)]
[1] "bb" "dd
Example: Retrieving using equal operator
Input:
c <- 10:3
c[c == 5]
Output:
[1] 5
Output:
[1] 10 9 8 7 6 5 4 3
[1] 5
[1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
Output:
[1] 10 9 8 7 6
Input:
c <- 10:3
c[c <= 7]
Output:
[1] 7 6 5 4 3
Example: We can also use boolean operators (i.e., AND &, OR |) to combine multiple
criteria:
35
Input:
c <- 10:3
c<9&c>4
Output:
[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE
Example: Because 0 and 1 values can be coerced to logicals, we can also use some
shorthand to get the same indices as logical values:
Input:
as.logical(c(1, 1, 0))
Output:
[1] TRUE TRUE FALSE
Example:
Input:
d <- 1:3
d[c(TRUE, TRUE, FALSE)]
Output:
[1] 1 2
Centre Justify
# justify options
format(c("A", "BB", "CCC"), width = 5, justify = "centre")
#> [1] " A " " BB " " CCC "
Left Justify
36
format(c("A", "BB", "CCC"), width = 5, justify = "none")
#> [1] "A" "BB" "CCC"
Use of digits’ widths
# digits
format(1/1:5, digits = 2)
#> [1] "1.00" "0.50" "0.33" "0.25" "0.20"
Use of digits’ widths and justify
# use of 'digits', widths and justify
Input:
# use of 'nsmall'
format(13.7, nsmall = 3)
#> [1] "13.700"
# use of 'digits'
format(c(6.0, 13.1), digits = 2)
#> [1] " 6" "13"
Output:
[1] "13.700"
[1] " 6" "13"
[1] " 6.0" "13.1"
37
Question Bank
Data Science Using R
Module-IV
Question 1: Explain paste function in R.
Answer: paste() function takes multiple elements from the multiple vectors and
concatenates them into a single element.
A simple paste() will take multiple elements as inputs and concatenate those inputs into a
single string. The elements will be separated by a space as the default option. But we can
also change the separator value using the ‘sep’ parameter.
Example: Simple paste function
Input:
paste(1,'two',3,'four',5,'six')
Output = “1 two 3 four 5 six”
Input:
paste(1,'two',3,'four',5,'six',sep = "_")
Output = “1_two_3_four_5_six”
Input:
paste(1,'two',3,'four',5,'six',sep = "&")
Output = “1&two&3&four&5&six”
Input:
paste(c(1,2,3,4,5,6,7,8),collapse = "_")
Output = “1_2_3_4_5_6_7_8”
38
Input:
paste(c('Rita','Sam','John','Jat','Cook','Reaper'), collapse = ' and ')
Output = “Rita and Sam and John and Jat and Cook and Reaper”
Example: The paste() function with both separator and collapse arguments
The separator will deal with the values which are to be placed in between the set of
elements and the collapse argument will make use of specific value to concatenate the
elements into single -string.
Input:
paste(c('a','b'),1:10,sep = '_',collapse = ' and ')
Output = "a_1 and b_2 and a_3 and b_4 and a_5 and b_6 and a_7 and b_8 and a_9 and
b_10
Input:
paste(c('John','Ray'),1:5,sep = '=',collapse = ' and ')
Output = “John=1 and Ray=2 and John=3 and Ray=4 and John=5”
Input:
#create some vector of data values for an illustration
data <- c(5, 6, 8, 2, 1, 2, 18, 19)
#Now we can define a vector of groupings
groups <- c('A', 'A', 'A', 'B', 'C', 'C'',C', 'C')
#Yes, It’s ready to split vector of data values into groups
split(x = data, f = groups)
Output:
$A
[1] 5 6 8
$B
[1] 2
$C
[1] 1 2 18 19
39
Score=c(303, 128, 341, 319, 54, 74),
Quality=c(38, 27, 224, 228, 32, 41))
#Let’s view the data frame
df
#Let’s split the data frame into groups based on ‘product’
split(df, f = df$Product)
Output:
Product Condition Score Quality
1 X T 303 38
2 X T 128 27
3 Y F 341 224
4 Y F 319 228
5 Y T 54 32
6 Z F 74 41
$X
Product Condition Score Quality
1 X T 303 38
2 X T 128 27
$Y
Product Condition Score Quality
3 Y F 341 224
4 Y F 319 228
5 Y T 54 32
$Z
Product Condition Score Quality
6 Z F 74 41
where:
x: Name of vector
list: Elements to replace
values: Replacement values
40
Input:
#define vector of values
data <- c(3, 6, 8, 12, 14, 15, 16, 19, 22)
#define new vector with a different value in position 2
data_new <- replace(data, 2, 50)
#view new vector
data_new
Output:
[1] 3 50 8 12 14 15 16 19 22
Notice that the element in position 2 has changed, but every other value in the original
vector remained the same.
Input:
#define vector of values
data <- c(2, 4, 6, 8, 10, 12, 14, 16)
#define new vector with different values in position 1, 2, and 8
data_new <- replace(data, c(1, 2, 8), c(50, 100, 200))
#view new vector
data_new
Output:
[1] 50 100 6 8 10 12 14 200
Input:
#define data frame
df <- data.frame(x=c(1, 2, 4, 4, 5, 7),
y=c(6, 6, 8, 8, 10, 11))
#view data frame
df
#replace values in column 'x' greater than 4 with a new value of 50
df$x <- replace(df$x, df$x > 4, 50)
#view updated data frame
df
Output:
Before replacement function
x y
11 6
22 6
41
34 8
44 8
5 5 10
6 7 11
# R program to illustrate
# grep function
# Creating string vector
x <- c("GFG", "gfg", "Geeks", "GEEKS")
# Calling grep() function
grep("gfg", x)
grep("Geeks", x)
grep("gfg", x, ignore.case = FALSE)
grep("Geeks", x, ignore.case = TRUE)
42
Output:
[1] 2
[1] 3
[1] 2
[1] 3 4
Example:2
Input:
# R program to illustrate
# grep function
# Creating string vector
x <- c("GFG", "gfg", "Geeks", "GEEKS")
# Calling grep() function
grep("gfg", x, ignore.case = TRUE, value = TRUE)
Input:
#Author DataFlair
str = "Splitting sentence into words"
str
strsplit(str, " ")
43
Output:
[1] "Splitting sentence into words"
[[1]]
substr(num, 5, 7)
Output:
[1] "45"
[1] "567"
44
Question Bank
Data Science Using R
Module-V
Question 1: Write down about mean and median functions for central tendency in R.
Answer:
mean(x)
It is calculated by taking the sum of the values and dividing with the number of values in a
data series. The function mean() is used to calculate this in R.
Output:
Training Pulse Duration
1 Strength 100 40
2 Stamina 100 35
3 Bulky 120 50
4 Lean 100 34
5 Athlete 120 50
6 Boxer 112 42
[1] "Mean of Duration is"
[1] 41.83333
median(x)
The middle most value in a data series is called the median. The median() function is used in
R to calculate this value.
45
For example, median can be calculated in R as….
# Create the vector.
x <- c(2,3,4,5,6)
# Find the median.
median.result <- median(x)
print(median.result)
Output:
Training Pulse Duration
1 Strength 100 40
2 Stamina 100 35
3 Bulky 120 50
4 Lean 100 34
5 Athlete 120 50
6 Boxer 112 42
[1] "Median of Duration is"
[1] 41
Question 2: What is variability? What is the different statistical function for variability in
R.
Answer:
Variability (also known as Statistical Dispersion) is another feature of descriptive statistics.
Measures of central tendency and variability together comprise of descriptive statistics.
Variability shows the spread of a data set around a point.
Example: Suppose, there exist 2 data sets with the same mean value:
A = 4, 4, 5, 6, 6
Mean(A) = 5
B = 1, 1, 5, 9, 9
Mean(B) = 5
So, to differentiate among the two data sets, R offers various measures of variability.
Measures of Variability
46
Following are some of the measures of variability that R offers to differentiate between
data sets:
Variance
Standard Deviation
Range
Mean Deviation
Interquartile Range
Variance
Variance is a measure that shows how far is each value from a particular point, preferably
mean value. Mathematically, it is defined as the average of squared differences from the
mean value.
Example: 1
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Print variance of x
print(var(x))
Standard Deviation
Standard deviation in statistics measures the spread ness of data values with respect to
mean and mathematically, is calculated as square root of variance.
In R language, there is no standard built-in function to calculate the standard deviation of a
data set. So, modifying the code to find the standard deviation of data set.
Example: 2
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Standard deviation
d <- sqrt(var(x))
# Print standard deviation of x
print(d)
Range
Range is the difference between maximum and minimum value of a data set. In R language,
max() and min() is used to find the same, unlike range() function that returns the minimum
and maximum value of data set.
Example: 3
Input:
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# range() function output
print(range(x))
# Using max() and min() function
# to calculate the range of data set
print(max(x)-min(x))
Output:
[1] 5 16
[1] 11
47
Mean Deviation
Mean deviation is a measure calculated by taking an average of the arithmetic mean of the
absolute difference of each value from the central value. Central value can be mean,
median, or mode.
Example: 4
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Mean deviation
md <- sum(abs(x-mean(x)))/length(x)
# Print mean deviation
print(md)
Interquartile Range
Interquartile Range is based on splitting a data set into parts called as quartiles. There are 3
quartile values (Q1, Q2, Q3) that divide the whole data set into 4 equal parts. Q2 specifies
the median of the whole data set.
Mathematically, the interquartile range is depicted as:
IQR = Q3 – Q1
where,
Q3 specifies the median of n largest values
Q1 specifies the median of n smallest values
Example: 5
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Print Interquartile range
print(IQR(x))
48
# Taking two numeric
# Vectors with same length
x = c(1, 2, 3, 4, 5, 6, 7)
y = c(1, 3, 6, 2, 7, 4, 5)
# Calculating
# Correlation coefficient
# Using cor() method
result = cor(x, y, method = "pearson")
# Print the result
cat("Pearson correlation coefficient is:", result)
Output:
Pearson correlation coefficient is: 0.5357143
data: x and y
t = 1.4186, df = 5, p-value = 0.2152
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3643187 0.9183058
sample estimates:
cor
0.5357143
Answer: In statistics, skewness and kurtosis are the measures which tell about the shape of
the data distribution or simply, both are numerical methods to analyze the shape of data set
unlike, plotting graphs and histograms which are graphical methods. These are normality
tests to check the irregularity and asymmetry of the distribution. To calculate skewness and
kurtosis in R language, moments package is required.
49
Skewness
Skewness is a statistical numerical method to measure the asymmetry of the distribution or
data set. It tells about the position of the majority of data values in the distribution around
the mean value.
There exist 3 types of skewness values on the basis of which asymmetry of the graph is
decided. These are as follows:
Positive Skew
If the coefficient of skewness is greater than 0 i.e. \gamma_{1}>0 , then
the graph is said to be positively skewed with the majority of data values
less than mean. Most of the values are concentrated on the left side of the
graph
Zero Skewness or Symmetric
If the coefficient of skewness is equal to 0 or approximately close to 0 i.e.
\gamma_{1}=0 , then the graph is said to be symmetric and data is
normally distributed.
Negatively skewed
If the coefficient of skewness is less than 0 i.e. \gamma_{1}<0 , then the
graph is said to be negatively skewed with the majority of data values greater
than mean. Most of the values are concentrated on the right side of the
graph.
50
Mesokurtic
If the coefficient of kurtosis is equal to 3 or approximately close to 3 i.e. \gamma_{2}=3 ,
then the data distribution is mesokurtic. For normal distribution, kurtosis value is
approximately equal to 3.
Leptokurtic
If the coefficient of kurtosis is greater than 3 i.e. \gamma_{1}>3 , then the data distribution
is leptokurtic and shows a sharp peak on the graph.
51