R Language Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Question Bank

Data Science Using R


Module-I
Question 1: Explain how R can be used as a calculator to execute different arithmetic
expressions? Explain operator precedence in R.

Answer: R can be used as a powerful calculator by entering equations directly at the prompt
in the command console. Simply type arithmetic expression and press ENTER. R will evaluate
the expressions and respond with the result.

Simple arithmetic operators


The operator R uses for basic arithmetic are:
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponentiation
%% Give the remainder of the first vector with the second
%/% The result of division of first vector with second (quotient)

Some arithmetic expressions using R.

1. Input: 4 + 8 will return the result 12


Output:
> 4+8
[1] 12

2. Input: 5 * 14 will return the result 70


Output:
> 5*14
[1] 70

3. Input: 7 / 4 will return the result 1.75


Output:
> 7/4
[1] 1.75

4. Input: 4 + 5 + 3 will return the result 12


Output:
> 4+5+3
[1] 12

5. Input: 4 ^ 3 will return the result 64


Output:
> 4^3

1
[1] 64

6. Input:
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v%%t)

it produces the following result −


[1] 2.0 2.5 2.0

7. Input:
v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v%/%t)

it produces the following result −


[1] 0 1 1

Operator Precedence in R

Operator Description Associativity

^ Exponent Right to Left

-x, +x Unary minus, Unary plus Left to Right

%% Modulus Left to Right

*, / Multiplication, Division Left to Right

+, – Addition, Subtraction Left to Right

Examples of Operator Precedence


Input: 4 + 5 * 3 will return the result 19
Output:
>4 + 5 * 3
[1] 19

1. Input: 4 + 3 ^ 2 will return the result 13


Output:
4+3^2
[1] 13

2. Input: (4 + 3) ^ 2 will return the result 49


Output:

2
(4 + 3) ^ 2
[1] 49

Question 2: Explain different numeric functions in R.

Answer: There are various function in R as follows:

Numeric Functions
abs() Absolute value
sqrt() Square root
round(), floor(), ceiling() Rounding, up and down
sum(), prod() Sum and product
log(), log10(), log2() Logarithms
exp() Exponential function
sin(), cos(), tan(), Trigonometric functions
max() Returns maximum value
min() Returns minimum value

1. abs(x): It returns the absolute (positive) value of input x.


Input: x<- -4
print(abs(x))
Output
[1] 4

2. sqrt(x): It returns the square root of input x.


Input: x<- 4
print(sqrt(x))
Output
[1] 2

3. ceiling(x): It returns the smallest integer which is larger than or equal to x. The ceiling ()
function rounds a number upwards to its nearest integer
Input: x<- 1.4
print(ceiling(x))
Output
[1] 2

4. floor(x): function rounds a number downwards to its nearest integer, and returns the
result.
Input: x<- 1.4
print(floor(x))
Output
[1] 1

5. trunc(x): It returns the truncate value of input x.


Input: x<- c (1.2,2.5,8.1)
print(trunc(x))

3
Output
[1] 1 2 8

6. round(x,2): It returns the truncate value of input x.


Input: x<- -4.5678 / x<- c (3.567, 4.875)
round (x, 2)
Output
[1] -4.57 / [1] 3.57 4.88

7. cos(x), sin(x), tan(x): It returns cos(x), sin(x) value of input x.


Input: x<- 4
print(cos(x))
print(sin(x))
print(tan(x))

8. log(x): It returns natural logarithm of input x.


Input: x<- 4
print(log(x))

9. log10(x): It returns common logarithm of input x.


Input: x<- 4
print(log10(x))

10. exp(x): It returns exponent.


Input: x<- 4
print(exp(x))

11. max (5, 10, 15): the min () and max () functions can be used to find the lowest or highest
number in a set.
Input: max (5, 10, 15) / x<- c (4,5,6); max(x)
Output
[1] 15 / [1] 6

12. min (5, 10, 15): the min () function can be used to find the lowest number in a set.
Input: min (5, 10, 15) / x<- c (4,5,6); min(x)
Output
[1] 5 / [1] 4

13. sum(2,3)
Output
[1] 5

14. prod(2,3)
Output
[1] 6

4
Question 3: What are the different operators in R? Explain different Logical Operators and
Assignment Operators in R.

Answer:
There are following types of operators in R programming −

Arithmetic Operators
Relational Operators
Logical Operators
Assignment Operators
R Logical Operators

Logical operators are used to carry out Boolean operations like AND, OR etc.
Logical Operators in R

Operator Description

! Logical NOT

& Element-wise logical AND

&& Logical AND

| Element-wise logical OR

|| Logical OR

Operators & and | perform element-wise operation producing result having length of the
longer operand.
But && and || examines only the first element of the operands resulting into a single length
logical vector.
Zero is considered FALSE and non-zero numbers are taken as TRUE.
Examples for various logical operators in R
> x <- c(TRUE,FALSE,0,6)
> y <- c(FALSE,TRUE,FALSE,TRUE)

1) NOT (!) Operator: It is called Logical NOT operator. Takes each element of the vector and
gives the opposite logical value.

Input: > !x
[1] FALSE TRUE TRUE FALSE

5
2) AND (&) Operator: It is called Element-wise Logical AND operator. It combines each
element of the first vector with the corresponding element of the second vector and gives
an output TRUE if both the elements are TRUE.
Input: > x&y
[1] FALSE FALSE FALSE TRUE
3) Logical AND (&&) Operator: It is called Logical AND operator. Takes first element of both
the vectors and gives the TRUE only if both are TRUE.
Input: > x&&y

[1] FALSE
4) Element wise Logica OR (|): It is called Element-wise Logical OR operator. It combines
each element of the first vector with the corresponding element of the second vector and
gives an output TRUE if one the elements is TRUE.
Input: > x|y
[1] TRUE TRUE FALSE TRUE
5) Logical OR (||) Operator: It is called Logical OR operator. Takes first element of both the
vectors and gives the TRUE if one of them is TRUE.
Input: > x||y
[1] TRUE

Assignment Operators in R
There are various assignment operators in R as follows:

Operator Description

<-, <<-, = Leftwards assignment

->, ->> Rightwards assignment

The operators <- and = can be used, almost interchangeably, to assign to variable in the
same environment.
Examples:

Input: > x <- 5


>x
Output: [1] 5

Input: > x = 9

6
>x
Output: [1] 9

Input: > 10 -> x


>x
Output: [1] 10

The <<- operator is used for assigning to variables in the parent environments (more like
global assignments). The rightward assignments, although available are rarely used.
Examples:

c(3,1,TRUE,2) -> v1
c(3,1,TRUE,2) ->> v2
print(v1)
print(v2)
it produces the following result −

[1] 3 1 1 2
[1] 3 1 1 2

Question 4: What is Matrix? Explain Matrix operations in R.

Answer: Matrices in R are a bunch of values, either real or complex numbers, arranged in a
group of fixed number of rows and columns. Matrices are used to depict the data in a
structured and well-organized format. It is necessary to enclose the elements of a matrix in
parentheses or brackets.
Example of A matrix with 9 elements is shown below.

This Matrix [M] has 3 rows and 3 columns. Each element of matrix [M] can be referred to by
its row and column number. For example, a23 = 2.
Order of a Matrix: The order of a matrix is defined in terms of its number of rows and
columns. Order of a matrix = No. of rows × No. of columns, Therefore Matrix [M] is a matrix
of order 3 × 3.
Operations on Matrices

7
There are four basic operations i.e. DMAS (Division, Multiplication, Addition, Subtraction)
that can be done with matrices. Both the matrices involved in the operation should have the
same number of rows and columns.
Matrices Addition
The addition of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where every
element is the sum of corresponding elements of the input matrices.
# Creating 1st Matrix
B = matrix(c(1, 2, 5.4, 3, 4, 5), nrow = 2, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(2, 0, 0.1, 3, 4, 5), nrow = 2, ncol = 3)
# Printing the resultant matrix
print(B + C)
Matrices Subtraction

The subtraction of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where every
element is the difference of corresponding elements of the second input matrix from the
first.

# Creating 1st Matrix


B = matrix(c(1, 2, 5.4, 3, 4, 5), nrow = 2, ncol = 3)
# Creating 2nd Matrix
C = matrix(c(2, 0, 0.1, 3, 4, 5), nrow = 2, ncol = 3)
# Printing the resultant matrix

print(B - C)
Matrices Multiplication
The multiplication of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where
every element is the product of corresponding elements of the input matrices.
# Creating 1st Matrix
B = matrix(c(1, 2, 5.4), nrow = 1, ncol = 3)
# Creating 2nd Matrix

C = matrix(c(2, 1, 0.1), nrow = 1, ncol = 3)


# Printing the resultant matrix
print (B * C)

8
Matrices Division
The division of two same ordered matrices Mr*c and Nr*c yields a matrix Rr*c where every
element is the quotient of corresponding elements of the first matrix element divided by
the second.
# Creating 1st Matrix
B = matrix(c(4, 6, -1), nrow = 1, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(2, 2i, 0), nrow = 1, ncol = 3)
# Printing the resultant matrix
print (B / C)

Question 5: What is data structure? Explain data structure in R.

Answer: A data structure is essentially a way to organize data in a system to facilitate


effective usage of the same. The whole idea is to reduce the complexities of space and time
in various tasks.

While using a programming language, different variables are essential to store different
data. These variables are reserved in a memory location for storing values. Once a variable is
created, some area in the memory is reserved.
Data structures are the objects that are manipulated regularly in R. They are used to store
data in an organized fashion to make data manipulation and other data operations more
efficient. R has many data structures. The following section will discuss them in detail.
R has many data structures, which include:

 Vector
 List
 Array
 Matrices
 Data Frame
 Factors
Vector in R
Vector is the easiest data type in R programming language. Vector can contain multiple
elements, but all the elements are of the same data type. Vector has a property called
length, which returns the number of elements in the vector.
R Vector is a sequence of data items of the same data type. Elements in a vector are
officially called components. It can contain an integer, double, character, logical, complex,
or raw data types.

9
For example, character vector can be created as…
vt <- c("Millie", "Noah", "Finn", "Sadie", "Gaten", "Caleb", "Winona", "David", "Natalia")
vt is the vector variable, which is created using character values.

List in R
Lists can be initialized by the command list and can grow dynamically. It is important to
understand that list elements should be accessed by the name of the entry via the dollar
sign or using double brackets. Lists are the R objects which contain elements of different
types like − numbers, strings, vectors and another list inside it. A list can also contain a
matrix or a function as its elements. List is created using list() function.
For example, List can be created as…

list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)


print(list_data)
Array in R
Array in R is a list or vector with two or more dimensions. An array is like a stacked matrix,
and a matrix is a two-dimensional array.
R Array

R arrays are objects which can store data in more than two dimensions. Uni-dimensional
arrays are called Vectors. A two-dimensional array is called Matrix. More than two-
dimensional objects are called Arrays, which consist of all elements of the same data type.
For example, Arrays can be created as…

rv <- c(11, 19, 18)


rv2 <- c(21, 6, 29, 46, 37, 38)
# Take these vectors as input to the array.
result <- array(c(rv, rv2), dim = c(3, 3, 2))
Data Frame in R

A data frame is the standard data structure for storing a data set with rows as observations
and columns as variables. A data frame is conceptually not much different from a matrix and
can either be initialized by reading a data set from an external file or by binding several
column vectors. As an example, consider three variables (age, favourite hobby, and
favourite animal) each with five observations,
Dataframe can be created as….

Input:
age <- c(25,33,30,40,28)

10
hobby <- c("Reading","Sports","Games","Reading","Games")
animal <- c("Elephant", "Giraffe", NA, "Monkey", "Cat")
dat <- data.frame(age,hobby,animal)

names(dat) <- c("Age","Favourite.hobby","Favourite.animal")


dat
Matrices in R
R Matrix is a vector with attributes of a dimension and optionally, dimension names
attached to the Vector. Matrix is a two-dimensional data structure in R programming. A
matrix in R is a collection of elements arranged in a two-dimensional rectangular layout.
For example, matrix can be created as….
Input:

vt <- c("Millie", "Noah", "Finn", "Sadie", "Gaten", "Caleb", "Winona", "David", "Natalia")
mtrx <- matrix(vt, nrow = 3, ncol = 3)
Question 6: Explain statistical function in R.
Answer: The various statistical functions in ‘R’ are as follow:

mean(x) Mean of x
median(x) Median of x
var(x) Variance of x
sd(x) Standard deviation of x
scale(x) Standard scores (z-scores) of x
quartile(x) The quartiles of x
summary(x) Summary of x: mean, min, max etc..
cov(a,b,method) Covariance of a and b by spearman method

mean(x)

It is calculated by taking the sum of the values and dividing with the number of values in a
data series. The function mean() is used to calculate this in R.
For example, mean can be calculated in R as….
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.

result.mean <- mean(x)


print(result.mean)

11
median(x)
The middle most value in a data series is called the median. The median() function is used in
R to calculate this value.
For example, median can be calculated in R as….
# Create the vector.

x <- c(2,3,4,5,6)
# Find the median.
median.result <- median(x)
print(median.result)
Standard Deviation, sd(x)

The standard deviation of a population is the square root of the population variance. It is
the measure of the distribution of the values. The higher the standard deviation, the wider
the spread of values. The lower the standard deviation, the closer the spread of values.

The ‘sd’ in R is a built-in function that accepts the input object and computes the standard
deviation of the values provided in the object. The sd() function accepts a numerical vector
and logical arguments and returns the standard deviation.

For example, standard deviation can be calculated in R as….


x <- c(34,56,87,65,34,56,89) #creates list 'x' with some values in it.
sd(x) #calculates the standard deviation of the values in the list 'x'
Variance in R
To calculate the variance in R, use the var() function. The var() is a built-in function that
computes the sample variance of a vector. It is the measure of how much value is away
from the mean value.
For example, variance can be calculated in R as….
Input:

weights <- c(60, 55, 50, 65, 59)


var(weights)
Covariance
In Statistics, Covariance is the measure of the relation between two variables of a dataset.
That is, it depicts the way two variables are related to each other. For an instance, when two
variables are highly positively correlated, the variables move ahead in the same direction.
Covariance is useful in data pre-processing prior to modelling in the domain of data science
and machine learning.

12
In R programming, we make use of cov() function to calculate the covariance between two
data frames or vectors.
For example, covariance can be calculated in R as….
Input: a <- c(2,4,6,8,10)
b <- c(1,11,3,33,5)

print(cov(a, b, method = "spearman"))


Correlation
Correlation on a statistical basis is the method of finding the relationship between the
variables in terms of the movement of the data. That is, it helps us analyze the effect of
changes made in one variable over the other variable of the dataset. When two variables
are highly (positively) correlated, we say that the variables depict the same information and
have the same effect on the other data variables of the dataset. The cor() function in R
enables us to calculate the correlation between the variables of the data set or vector.

For example, correlation can be calculated in R as….


Input: a <- c(2,4,6,8,10)
b <- c(1,11,3,33,5)
corr = cor(a,b)
print(corr)

13
Question Bank
Data Science Using R
Module-II
Question 1: What is conditional statements? Explain if and if-else conditional statements
in R.

Answer: There are a lot of situations where we do not just want to execute one statement
after another: in fact, we have to control the flow of execution also. Usually this means that
we merely want to execute some code if a condition is fulfilled. In that case control flow
statements are implemented within R Program.

The conditional statements in R as follows:

• If statement
• If-else statement
If control statement in R

If control structure in R programming language evaluates a given


condition or expression is satisfied. The block of statement inside
the curly braces {} is executed only when the condition followed
by the if keyword is satisfied. The condition should be evaluated
either as TRUE or FALSE.
If the condition is TRUE, the codes or statements just after the
condition provided in {} will execute otherwise the control will
switch to the next statement immediately after the if statement.
A general structure for the control flows during the if condition is
depicted in below flow chart.
For example: let’s compare the two numbers using R.
Input:
a=8

14
b=7
if(a<b){
print(paste(a, "is smaller than ", paste(b)))
}
print(paste(a ,"is greater than",paste(b)))
print("The condition (a<b) is not satisfied")
Output:
[1] "8 is greater than 7"
[1] "The condition (a<b) is not satisfied"

If-else control statement in R


If-else is another control structure somewhat similar to if statements but here you can
provide an else part after the if condition. The if condition checks the condition or
expression to be evaluated inside the curly braces {} is True or false. If it returns a TRUE,
then executes the block of codes inside the {} curly braces. On the other hand, if the
expression or the condition evaluated is false the code provided inside the else part will get
executed.
Let us understand the if-else control structure in R
programming with a flow chart
For example: let’s compare the two numbers using R.

Input:
a=5
b=7
if(a<b){
print(paste(a, "is smaller than ", paste(b)))
print("Terminated if condition")
}else{
print(paste(a ,"is greater than",paste(b)))
print("The condition (a<b) is not satisfied")
print("Executed else condition")
}
Output:
[1] "5 is smaller than 7"
[1] "Terminated if condition"

Question 2: Describe Nested if and Switch


conditional statements in R.
Answer: Nested if in R

In R programming one if statement inside


another if statement is defined as a nested if
statement. When the if condition evaluates to

15
be TRUE immediately control jumps to nested if condition and starts evaluating the
condition given inside the if statement. After evaluation, if the result is TRUE or satisfied the
block of codes just below the inner if condition enclosed within curly braces gets executed.
On the other hand, if the inner if condition falls as FALSE the else part code region gets
executed. Till now we discussed when the outer if condition is evaluated to be TRUE. When
the outer if condition is FALSE after its evaluation the control jumps to the else part at the
end section and executes the instruction given inside its curly braces.
For example, the program shows when the outer if condition is satisfied inner if condition
gets executed. In the example, x is assigned a value of 23.
• If (x>20) is evaluated as TRUE control jumps to immediate if condition if (x > 0).
• Evaluates if (x > 0) if it is TRUE print statements inside {} otherwise execute else
region followed by the inner if condition.
• Terminates control structure execution.
Input:
x <-23
if (x > 20) {
print("x is greater than 20")
if (x > 0) {
print("x is greater than 0")
} else {
print("x is negative ")
}
} else {
print("x is zero.")
}
Output
[1] "x is greater than 20"
[1] "x is greater than 0"
Switch control statement in R

The switch control structure is another control statement and is somewhat similar to the
else-if functionality. The R programming language
using switch statements has m number of
options to choose and execute any one of the
options (cases) by validating which case satisfies
the requirement of the provided condition. When
the condition matches with the corresponding
case then that case will only execute. Thus the
switch() statement in R programming language
evaluates a condition against a list of given
elements or values and always returns the first
matching element or value from the list. The
switch control structure in R evaluates the
expression or the condition provided and

16
accordingly selects the arguments such as case1 and case2 inside the parentheses ().

For example,

a <- switch(
4, #condition
#(...... or case1,case2..)
"red", #case1
"blue", #case2
"pink", #case3
"orange" #case4
)

print(a)

Output:
[1] "orange"

The switch evaluates the condition given 4 and the corresponding element in the list of
arguments matching with it is returned. Here condition 4 matches with case4. Let us see the
output displayed when the code is executed.
Question 3: What is loop? Explain ‘For’ loop and if-else function in R.

Answer: In R programming, we require a control structure to run a block of code multiple


times. Loops come in the class of the most fundamental and strong programming concepts.
A loop is a control statement that allows multiple executions of a statement or a set of
statements. The word ‘looping’ means cycling or iterating.
A loop asks a query, in the loop structure. If the answer to that query requires an action, it
will be executed. The same query is asked again and again until further action is taken. Any
time the query is asked in the loop, it is known as an iteration of the loop. There are two
components of a loop, the control statement, and the loop body. The control statement
controls the execution of statements depending on the condition and the loop body consists
of the set of statements to be executed. In order to execute the identical lines of code
numerous times in a program, a programmer can simply use a loop.
For Loop in R
It is a type of control statement that enables one to easily construct a loop that has to run
statements or a set of statements multiple times. For loop is commonly used to iterate over
items of a sequence. It is an entry controlled loop, in this loop the test condition is tested
first, then the body of the loop is executed, the loop body would not be executed if the test
condition is false.
R – For loop Syntax:

17
for (value in sequence)
{
statement

For example, Program to display numbers from 1 to 5 using for loop in R.


Input:
# R program to demonstrate the use of for loop
# using for loop
for (val in 1: 5)
{
# statement
print(val)
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Here, for loop is iterated over a sequence having numbers from 1 to 5. In each iteration,
each item of the sequence is displayed.
ifelse() function in R
The ifelse() function is a very useful function in the R programming language. It can be
considered as an immediate if statement which means it evaluates the object in the first
parameter of the ifelse() function and if it is TRUE returns the object in the second
parameter. In the case of FALSE, it returns the object in the third parameter. When we
perform certain operations most of the inputs are taken as vectors. In R programming
language ifelse() function is a shorthand representation of the conditional statement if-else.
The below syntax is used for ifelse().
Syntax

Ifelse(condition,TRUE,FALSE)

18
Where,
•condition is the expression to evaluate or test. (parameter 1)
•The TRUE denotes the statement or instruction to execute if condition is satisfied.
(parameter 2)
• The FALSE denotes the statement or instruction to execute if condition is not
satisfied. (parameter 3)
For example: Program to display numbers from 1 to 5 using for loop in R.
Input:
# R program to demonstrate the use of for loop
# using for loop
for (val in 1: 5)
{
# statement
print(val)
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Here, for loop is iterated over a sequence having numbers from 1 to 5. In each iteration,
each item of the sequence is displayed.
Question 4: Explain ‘While’ loop and ‘Repeat’ loop in R.

Answer: Repeat Loop in R


It is a simple loop that will run the same statement or a group of statements repeatedly
until the stop condition has been encountered. Repeat loop does not have any condition to
terminate the loop, a programmer must specifically place a condition within the loop’s body
and use the declaration of a break statement to terminate this loop. If no condition is
present in the body of the repeat loop, then it will iterate infinitely.

R – Repeat loop Syntax:


repeat
{
statement

if( condition )
{
break
}
}

19
Repeat loop Flow Diagram:

To terminate the repeat loop, we use a jump statement that is the break keyword. Below
are some programs to illustrate the use of repeat loops in R programming.
For example, Program to display numbers from 1 to 5 using repeat loop in R.

Input:
# R program to demonstrate the use of repeat loop

val = 1

# using repeat loop


repeat
{
# statements
print(val)
val = val + 1

# checking stop condition


if(val > 5)
{
# using break statement
# to terminate the loop
break
}
}

Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

In the above program, the variable val is initialized to 1, then in each iteration of the repeat
loop the value of val is displayed and then it is incremented until it becomes greater than 5.
If the value of val becomes greater than 5 then break statement is used to terminate the
loop.

20
While Loop in R
It is a type of control statement which will run a statement or a set of statements repeatedly
unless the given condition becomes false. It is also an entry controlled loop, in this loop the
test condition is tested first, then the body of the loop is executed, the loop body would not
be executed if the test condition is false.
R – While loop Syntax:
while ( condition )
{

statement
}
While loop Flow Diagram:

For example, Program to display numbers from 1 to 5 using while loop in R.


Input:
# R program to demonstrate the use of while loop
val = 1
# using while loop
while (val <= 5)
{
# statements
print(val)
val = val + 1
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Initially, the variable value is initialized to 1. In each iteration of the while loop the condition
is checked and the value of val is displayed and then it is incremented until it becomes 5 and
the condition becomes false, the loop is terminated.

21
Question 5: Describe different character functions in R.
Answer: The different character functions in R as follows:
1. Convert object into character type

The as.character function converts argument to character type. In the example below, we
are storing 25 as a character.
Y = as.character(25)
class(Y)
The class(Y) returns character as 25 is stored as a character in the previous line of code.

2. Check the character type


To check whether a vector is a character or not, use is.character function.
x = "I love R Programming"
is.character(x)
Output : TRUE
Like is.character function, there are other functions such as is.numeric, is.integer and
is.array for checking numeric vector, integer and array.

3. Concatenate Strings
The paste function is used to join two strings. It is one of the most important string
manipulation task. Every analyst performs it almost daily to structure data.
For example,
x = "Deepanshu"
y ="Bhalla"
paste(x, y)
Output : Deepanshu Bhalla
paste(x, y, sep = ",")
Output : Deepanshu,Bhalla

4. String Formatting
Suppose the value is stored in fraction and you need to convert it to percent. The sprintf is
used to perform C-style string formatting.
The keyword ‘fmt’ denotes string format. The format starts with the symbol % followed by
numbers and letters.
x = 0.25
sprintf("%.0f%%",x*100)
Output : 25%

Note : '%.0f' indicates 'fixed point' decimal notation with 0 decimal. The extra % sign after 'f'
tells R to add percentage sign after the number.
If you change the code to sprintf("%.2f%%",x*100), it would return 25.00%.

5. Extract or replace substrings


substr Syntax - substr(x, starting position, end position)

22
x = "abcdef"
substr(x, 1, 3)
Output : abc
In the above example. we are telling R to extract string from 1st letter through 3rd letter.

6. String Length
The nchar function is used to compute the length of a character value.
x = "I love R Programming"
nchar(x)
Output : 20
It returns 20 as the vector 'x' contains 20 letters (including 3 spaces).

7. Convert Character to Uppercase / Lowercase /Propercase

In many times, we need to change case of a word. For example. convert the case to
uppercase or lowercase.

Examples
x = "I love R Programming"
tolower(x)
Output : "i love r programming"

The tolower() function converts letters in a string to lowercase.


toupper(x)
Output : "I LOVE R PROGRAMMING"

The toupper() function converts letters in a string to uppercase.


library(stringr)
str_to_title(x)
Output : "I Love R Programming"

The str_to_title() function converts first letter in a string to uppercase and the remaining
letters to lowercase.

8. Repeat the character N times


In case you need to repeat the character number of times, you can do it with strrep base R
function.
strrep("x",3)
Output: "xxx"
Question 6: What are the Date functions in R.
Answer: The following symbols can be used with the format() function to print dates.

23
Symbol Meaning Example
%d day as a number (0-31) 01-31
%a abbreviated weekday Mon
%A unabbreviated weekday Monday
%m month (00-12) 00-12
%b abbreviated month Jan
%B unabbreviated month January
%y 2-digit year 07
%Y 4-digit year 2007

Weekday:
The %a, %A, and %u specifiers which give the abbreviated weekday, full weekday, and
numbered weekday starting from Monday.
For example,
# today date
date<-Sys.Date()

# abbreviated month
format(date,format="%a")
# fullmonth
format(date,format="%A")
# weekday

format(date,format="%u")
Output
[1] "Sat"
[1] "Saturday"
[1] "6"

Date:
The day, month, and year format specifiers to represent dates in different formats.
# today date
date<-Sys.Date()
# default format yyyy-mm-dd

24
date
# day in month
format(date,format="%d")

# month in year
format(date,format="%m")
# abbreviated month
format(date,format="%b")
# full month

format(date,format="%B")
# Date
format(date,format="%D")
format(date,format="%d-%b-%y")
Output:

[1] "2022-04-02"
[1] "02"
[1] "04"
[1] "Apr"
[1] "April"

[1] "04/02/22"
[1] "02-Apr-22"
Year:
We can also able to format the year in different forms. %y, %Y, and %C are the few format
specifiers that return the year without century, a year with century, and century of the given
date respectively.
# today date
date<-Sys.Date()

# year without century


format(date,format="%y")
# year with century
format(date,format="%Y")

25
# century
format(date,format="%C")
Output

[1] "22"
[1] "2022"
[1] "20"
Question 7: Describe how sorting will be executed in R programming?
Answer: There are various sort functions:

Method-I
sort () function in R Language is used to sort a vector by its values. It takes Boolean value as
argument to sort in ascending or descending order.
Syntax:
sort(x, decreasing, na.last)
Parameters:
x: Vector to be sorted

decreasing: Boolean value to sort in descending order


na.last: Boolean value to put NA at the end
For example:
# R program to sort a vector
# Creating a vector

x <- c(7, 4, 3, 9, 1.2, -4, -5, -8, 6, NA)


# Calling sort() function
sort(x)
Output:
[1] -8.0 -5.0 -4.0 1.2 3.0 4.0 6.0 7.0 9.0

The major drawback of the sort() function is that it cannot sort data frames.
Method-II
order() function
To overcome the drawback in method 1, we use the order() function, which also sorts data
frames according to the specified column. To sort in decreasing order add negative sign.

26
Data can also be sorted with multiple criteria. Suppose if the age of two persons is the same
then, we can sort them on the basis of their names i.e. lexicographically.
For example,
# define dataframe
df <- data.frame("Age" = c(12, 21, 15, 5, 25),
"Name" = c("Johnny", "Glen", "Alfie",
"Jack", "Finch"))

# sort the dataframe on the basis of


# age column and store it in newdf
newdf <- df[order(df$Age), ]

# print sorted dataframe


print(newdf)

Output:
Age Name
4 5 Jack
1 12 Johnny
3 15 Alfie
2 21 Glen
5 25 Finch

Method-III

Using arrange() Function from dplyr.

Arrange() is used to sort the dataframe in increasing order, it will also sort the dataframe
based on the column in the dataframe

Syntax:
arrange(dataframe,column)
where
• dataframe is the dataframe input
• column is the column name, based on this column dataframe is sorted

For example: R program to sort dataframe based on columns


In this program, we created three columns using the vector and sorted the dataframe based
on the subjects column

Input:
# load the package
library("dplyr")

# create dataframe with roll no, names


# and subjects columns

27
data = data.frame(rollno = c(1, 5, 4, 2, 3),
names = c("sravan", "bobby", "pinkey", "rohith", "gnanesh"),
subjects = c("java", "python", "php", "sql", "c"))
# sort the data based on subjects
print(arrange(data, subjects))

Output:
rollno names subjects
1 3 gnanesh c
2 1 sravan java
3 4 pinkey php
4 5 bobby python
5 2 rohith sql

28
Question Bank
Data Science Using R
Module-III
Question 1: Explain Factors in R programming.
Answer: Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns which have a
limited number of unique values. Like "Male, "Female" and True, False etc. They are useful
in data analysis for statistical modeling.

Factors are created using the factor () function by taking a vector as input.
Generating Factor Levels: We can generate factor levels by using the gl() function. It takes
two integers as input which indicates how many levels and how many times each level.

Syntax
gl(n, k, labels)
Following is the description of the parameters used −
• n is a integer giving the number of levels.
• k is a integer giving the number of replications.
• labels is a vector of labels for the resulting factor levels.

For Example,
Input:
v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)

Output:
When we execute the above code, it produces the following result −
Tampa Tampa Tampa Tampa Seattle Seattle Seattle Seattle Boston
[10] Boston Boston Boston
Levels: Tampa Seattle Boston

Creating a Factor: The command used to create or modify a factor in R language is – factor()
with a vector as input.
The two steps to creating a factor are:
• Creating a vector
• Converting the vector created into a factor using function factor()
Input:
# Creating a vector
x < -c("female", "male", "male", "female")
print(x)
# Converting the vector x into a factor
# named gender
gender < -factor(x)
print(gender)

29
Output:
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male

Question 2: Describe print and cat formatting functions in R.


Answer:

print() function
print() function is used to print a string and a variable. When we use a variable inside print(),
it prints the value stored inside the variable.
Example: 1 strings with quote
Input:
# text string
my_string <- "programming with data is fun"
print(my_string)
#[1] "programming with data is fun"

Output:
[1] "programming with data is fun"

Example: 2 strings without quote


To be more precise, print() is a generic function, which means that we should use this
function when creating printing methods for programmed classes. As we can see from the
previous example, print() displays text in quoted form by default. If we want to print
character strings with no quotes you can set the argument quote = FALSE

Input:
my_string <- "programming with data is fun"
# print without quotes
print(my_string, quote = FALSE)
#> [1] programming with data is fun

Output:
[1] programming with data is fun

cat() function

R programming also provides the cat() function to print variables. However, unlike print(),
the cat() function is only used with basic types like logical, integer, character, etc. The cat()
function in R can be used to concatenate together several objects in R

Example 1:
Input:
# print using Cat

30
cat("R Tutorials\n")
# print a variable using Cat
message <- "Programiz"
cat("Welcome to ", message)

Output
R Tutorials
Welcome to Programiz

In the example above, we have used the cat() function to display a string along with a
variable. The \n is used as a newline character.

Example 2: Use cat() to Concatenate Objects


We can use the cat() function to concatenate three strings in R:

Input:
#concatenate three strings
cat("hey", "there", "everyone")

Output:
hey there everyone

Question 3: What is string? How does string function in R?


Answer: Strings are a bunch of character variables. It is a one-dimensional array of
characters. One or more characters enclosed in a pair of matching single or double quotes
can be considered a string in R. Strings represent textual content and can contain numbers,
spaces, and special characters. An empty string is represented by using “. Strings are always
stored as double-quoted values in R. Double quoted string can contain single quotes within
it. Single quoted strings can’t contain single quotes. Similarly, double quotes can’t be
surrounded by double quotes.
Example: Creating a string

Input:
# R program for String Creation
# creating a string with double quotes
str1 <- "OK1"
cat ("String 1 is : ", str1)

Output:
String 1 is : OK1

Example: Length of string using str_length()


Input:
str_length("hello")

31
Output:
5

Example: Using nchar() function


Input:
nchar("hel'lo")

Output:
6

Example: Using substr() function


Input:
substr("Learn Code Tech", 1, 1) / Syntax - substr(..., start, end)

Output:
"L"
substr() or substring() function in R extracts substrings out of a string beginning with the
start index and ending with the end index.

Case Conversion
The string characters can be converted to upper or lower case by R’s inbuilt function
toupper() which converts all the characters to upper case, tolower() which converts all the
characters to lower case, and casefold(…, upper=TRUE/FALSE) which converts on the basis
of the value specified to the upper argument. All these functions can take in as arguments
multiple strings too.

Input:
# R program to Convert case of a string
str <- "Hi LeArn CodiNG"
print(toupper(str))
print(tolower(str))
print(casefold(str, upper = TRUE))

Output:
[1] "HI LEARN CODING"
[1] "hi learn coding"
[1] "HI LEARN CODING"

Question 4: Explain Vector Indexing in R.


Answer:
We retrieve values in a vector by declaring an index inside a single square bracket "[]"
operator. For example, the following shows how to retrieve a vector member. Since the
vector index is 1-based, we use the index position 3 for retrieving the third member.

32
Input:
> s = c("aa", "bb", "cc", "dd", "ee")
> s[3]

Output:
[1] "cc"

Unlike other programming languages, the square bracket operator returns more than just
individual members. In fact, the result of the square bracket operator is another vector, and
s[3] is a vector slice containing a single member "cc".

Negative Index

If the index is negative, it would strip the member whose position has the same absolute
value as the negative index. For example, the following creates a vector slice with the third
member removed.

Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[-3]

Output:
[1] "aa" "bb" "dd" "ee"

Out-of-Range Index

If an index is out-of-range, a missing value will be reported via the symbol NA.

Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[10]

Output:
[1] NA

Duplicate Indexes
The index vector allows duplicate values. Hence the following retrieves a member twice in
one operation.

Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[c(2, 3, 3)]

Output:
[1] "bb" "cc" "cc"

Range Index

33
To produce a vector slice between two indexes, we can use the colon operator ":". This can
be convenient for situations involving large vectors.

Input:
s = c("aa", "bb", "cc", "dd", "ee")
s[2:4]

Output:
[1] "bb" "cc" "dd"

Vector Slice Indexing

Retrieving a vector slice. Here it shows how to retrieve a vector slice containing the second
and third members of a given vector s.

Input:
> s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 3)]
Output:
[1] "bb" "cc"

Question 5: Describe different logical vector index in R.


Answer:
A new vector can be sliced from a given vector with a logical index vector, which has the
same length as the original vector. Its members are TRUE if the corresponding members in
the original vector are to be included in the slice, and FALSE if otherwise.
Example: Consider the following vector s of length 5.
Input:

s = c("aa", "bb", "cc", "dd", "ee")


To retrieve the second and fourth members of s, we define a logical vector L of the same
length, and have its second and fourth members set as TRUE.
> L = c(FALSE, TRUE, FALSE, TRUE, FALSE)
> s[L]
Output:
[1] "bb" "dd"
The code can be abbreviated into a single line.

34
> s[c(FALSE, TRUE, FALSE, TRUE, FALSE)]
[1] "bb" "dd
Example: Retrieving using equal operator

Input:
c <- 10:3
c[c == 5]

Output:
[1] 5

Example: Making assigned 5 value TRUE


Input:
c <- 10:3
c
c[c == 5]
c == 5

Output:
[1] 10 9 8 7 6 5 4 3
[1] 5
[1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE

Example: Using > operator to display 5 digits only


Input:
c <- 10:3
c[c > 5]

Output:
[1] 10 9 8 7 6

Example: Using <= operator to display 5 digits only

Input:
c <- 10:3
c[c <= 7]

Output:
[1] 7 6 5 4 3

Example: We can also use boolean operators (i.e., AND &, OR |) to combine multiple
criteria:

35
Input:
c <- 10:3
c<9&c>4

Output:
[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE

Example: Because 0 and 1 values can be coerced to logicals, we can also use some
shorthand to get the same indices as logical values:

Input:
as.logical(c(1, 1, 0))

Output:
[1] TRUE TRUE FALSE

Example:

Input:
d <- 1:3
d[c(TRUE, TRUE, FALSE)]

Output:
[1] 1 2

Question 6: Explain different justify formatting options in R.


Answer:

Centre Justify
# justify options
format(c("A", "BB", "CCC"), width = 5, justify = "centre")
#> [1] " A " " BB " " CCC "
Left Justify

format(c("A", "BB", "CCC"), width = 5, justify = "left")


#> [1] "A " "BB " "CCC "
Right Justify
format(c("A", "BB", "CCC"), width = 5, justify = "right")
#> [1] " A" " BB" " CCC"
No Justify

36
format(c("A", "BB", "CCC"), width = 5, justify = "none")
#> [1] "A" "BB" "CCC"
Use of digits’ widths

# digits
format(1/1:5, digits = 2)
#> [1] "1.00" "0.50" "0.33" "0.25" "0.20"
Use of digits’ widths and justify
# use of 'digits', widths and justify

format(format(1/1:5, digits = 2), width = 6, justify = "c")


#> [1] " 1.00 " " 0.50 " " 0.33 " " 0.25 " " 0.20 "
Output:
[1] " A " " BB " " CCC "
[1] "A " "BB " "CCC "

[1] " A" " BB" " CCC"


[1] "A" "BB" "CCC"
[1] "1.00" "0.50" "0.33" "0.25" "0.20"
[1] " 1.00 " " 0.50 " " 0.33 " " 0.25 " " 0.20 "
Some useful arguments in formatting

Input:
# use of 'nsmall'
format(13.7, nsmall = 3)
#> [1] "13.700"

# use of 'digits'
format(c(6.0, 13.1), digits = 2)
#> [1] " 6" "13"

# use of 'digits' and 'nsmall'


format(c(6.0, 13.1), digits = 2, nsmall = 1)
#> [1] " 6.0" "13.1"

Output:
[1] "13.700"
[1] " 6" "13"
[1] " 6.0" "13.1"

37
Question Bank
Data Science Using R
Module-IV
Question 1: Explain paste function in R.
Answer: paste() function takes multiple elements from the multiple vectors and
concatenates them into a single element.

The syntax of the paste() function is,


paste(x,sep=" ", collapse=NULL)
Here:
x = vector having values.
sep = separator symbols that can be used to separate the elements.
collapse = It gives a value to collapse.

A simple paste() will take multiple elements as inputs and concatenate those inputs into a
single string. The elements will be separated by a space as the default option. But we can
also change the separator value using the ‘sep’ parameter.
Example: Simple paste function
Input:
paste(1,'two',3,'four',5,'six')
Output = “1 two 3 four 5 six”

Example: Using paste() with a separator argument


The separator parameter in the paste() function will deal with the value or the symbols
which are used to separate the elements, which is taken as input by the paste() function.

Input:
paste(1,'two',3,'four',5,'six',sep = "_")
Output = “1_two_3_four_5_six”

Input:
paste(1,'two',3,'four',5,'six',sep = "&")
Output = “1&two&3&four&5&six”

Example: The paste() function with collapse argument


When we pass a paste argument to a vector, the separator parameter will not work. Hence
here comes the collapse parameter, which is highly useful when we are dealing with the
vectors. It represents the symbol or values which separate the elements in the vector.

Input:
paste(c(1,2,3,4,5,6,7,8),collapse = "_")
Output = “1_2_3_4_5_6_7_8”

38
Input:
paste(c('Rita','Sam','John','Jat','Cook','Reaper'), collapse = ' and ')
Output = “Rita and Sam and John and Jat and Cook and Reaper”

Example: The paste() function with both separator and collapse arguments
The separator will deal with the values which are to be placed in between the set of
elements and the collapse argument will make use of specific value to concatenate the
elements into single -string.

Input:
paste(c('a','b'),1:10,sep = '_',collapse = ' and ')
Output = "a_1 and b_2 and a_3 and b_4 and a_5 and b_6 and a_7 and b_8 and a_9 and
b_10

Input:
paste(c('John','Ray'),1:5,sep = '=',collapse = ' and ')
Output = “John=1 and Ray=2 and John=3 and Ray=4 and John=5”

Question 2: Describe split function in R.


Answer:
Split() is a built-in R function that divides a vector or data frame into groups according to the
function’s parameters. It takes a vector or data frame as an argument and divides the
information into groups.

Example: To divide a vector into groups, use the split() function

Input:
#create some vector of data values for an illustration
data <- c(5, 6, 8, 2, 1, 2, 18, 19)
#Now we can define a vector of groupings
groups <- c('A', 'A', 'A', 'B', 'C', 'C'',C', 'C')
#Yes, It’s ready to split vector of data values into groups
split(x = data, f = groups)

Output:
$A
[1] 5 6 8
$B
[1] 2
$C
[1] 1 2 18 19

Example: Using split function in Dataframe


Input:
df <- data.frame(Product=c('X', 'X', 'Y', 'Y', 'Y', 'Z'),
Condition=c('T', 'T', 'F', 'F', 'T', 'F'),

39
Score=c(303, 128, 341, 319, 54, 74),
Quality=c(38, 27, 224, 228, 32, 41))
#Let’s view the data frame
df
#Let’s split the data frame into groups based on ‘product’
split(df, f = df$Product)

Output:
Product Condition Score Quality
1 X T 303 38
2 X T 128 27
3 Y F 341 224
4 Y F 319 228
5 Y T 54 32
6 Z F 74 41

$X
Product Condition Score Quality
1 X T 303 38
2 X T 128 27

$Y
Product Condition Score Quality
3 Y F 341 224
4 Y F 319 228
5 Y T 54 32

$Z
Product Condition Score Quality
6 Z F 74 41

Question 3: Explain replacement function in R?


Answer: replace() function in R Language is used to replace the values in the specified string
vector x with indices given in list by those given in values.
This function uses the following syntax:

replace(x, list, values)

where:
x: Name of vector
list: Elements to replace
values: Replacement values

Example 1: Replace One Value in Vector


The following code shows how to replace the element in position 2 of a vector with a new
value of 50:

40
Input:
#define vector of values
data <- c(3, 6, 8, 12, 14, 15, 16, 19, 22)
#define new vector with a different value in position 2
data_new <- replace(data, 2, 50)
#view new vector
data_new
Output:
[1] 3 50 8 12 14 15 16 19 22
Notice that the element in position 2 has changed, but every other value in the original
vector remained the same.

Example 2: Replace Multiple Values in Vector


The following code shows how to replace the values of multiple elements in a vector with
new values:

Input:
#define vector of values
data <- c(2, 4, 6, 8, 10, 12, 14, 16)
#define new vector with different values in position 1, 2, and 8
data_new <- replace(data, c(1, 2, 8), c(50, 100, 200))
#view new vector
data_new

Output:
[1] 50 100 6 8 10 12 14 200

Example 3: Replace Values in Data Frame


The following code shows how to replace the values in a certain column of a data frame that
meet a specific condition:

Input:
#define data frame
df <- data.frame(x=c(1, 2, 4, 4, 5, 7),
y=c(6, 6, 8, 8, 10, 11))
#view data frame
df
#replace values in column 'x' greater than 4 with a new value of 50
df$x <- replace(df$x, df$x > 4, 50)
#view updated data frame
df

Output:
Before replacement function
x y
11 6
22 6

41
34 8
44 8
5 5 10
6 7 11

After replacement function


x y
1 1 6
2 2 6
3 4 8
4 4 8
5 50 10
6 50 11

Question 4: Explain different function related to manipulation with alphabets.

Answer: Manipulation with Alphabets


Syntax: grep() function
grep(pattern, x, ignore.case=TRUE/FALSE, value=TRUE/FALSE)
Parameters:
pattern: Specified pattern which is going to be matched with given elements of the string.

x: Specified string vector.


ignore.case: If its value is TRUE, it ignores case.
value: If its value is TRUE, it return the matching elements vector, else return the indices
vector.
Example:1 grep():
Input:

# R program to illustrate
# grep function
# Creating string vector
x <- c("GFG", "gfg", "Geeks", "GEEKS")
# Calling grep() function

grep("gfg", x)
grep("Geeks", x)
grep("gfg", x, ignore.case = FALSE)
grep("Geeks", x, ignore.case = TRUE)

42
Output:
[1] 2
[1] 3

[1] 2
[1] 3 4
Example:2
Input:
# R program to illustrate

# grep function
# Creating string vector
x <- c("GFG", "gfg", "Geeks", "GEEKS")
# Calling grep() function
grep("gfg", x, ignore.case = TRUE, value = TRUE)

grep("G", x, ignore.case = TRUE, value = TRUE)


grep("Geeks", x, ignore.case = FALSE, value = FALSE)
grep("GEEKS", x, ignore.case = FALSE, value = FALSE)
Output:
[1] "GFG" "gfg"

[1] "GFG" "gfg" "Geeks" "GEEKS"


[1] 3
[1] 4
Question 5: Describe strsplit(), sprint() and substr() function for manipulation of
alphabets.
Answer:
Example 1: strsplit() Function in R,

Input:
#Author DataFlair
str = "Splitting sentence into words"
str
strsplit(str, " ")

43
Output:
[1] "Splitting sentence into words"
[[1]]

[1] "Splitting" "sentence" "into" "words"


Example 2: sprintf(): This function makes of the formatting commands that are styled after
C
Input:
sprintf("%s scored %.2f percent", "Matthew", 72.3)
Output:
[1] "Matthew scored 72.30 percent"

Example 3: substr() function: It is the substrings of a character vector. The extractor


replaces substrings in a character vector.
Input:
#Author DataFlair
num <- "12345678"
substr(num, 4, 5)

substr(num, 5, 7)
Output:
[1] "45"
[1] "567"

44
Question Bank
Data Science Using R
Module-V
Question 1: Write down about mean and median functions for central tendency in R.
Answer:
mean(x)

It is calculated by taking the sum of the values and dividing with the number of values in a
data series. The function mean() is used to calculate this in R.

For example, mean can be calculated in R as….


# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)

Example: Calculating mean from Dataframe


Input:
df <- data.frame (
Training = c("Strength", "Stamina", "Bulky", "Lean", "Athlete", "Boxer"),
Pulse = c(100, 100, 120, 100, 120, 112),
Duration = c(40, 35, 50, 34, 50, 42)
)
df
# Compute the mean value
mean = mean(df$Duration)
paste("Mean of Duration is")
print(mean)

Output:
Training Pulse Duration
1 Strength 100 40
2 Stamina 100 35
3 Bulky 120 50
4 Lean 100 34
5 Athlete 120 50
6 Boxer 112 42
[1] "Mean of Duration is"
[1] 41.83333
median(x)

The middle most value in a data series is called the median. The median() function is used in
R to calculate this value.

45
For example, median can be calculated in R as….
# Create the vector.
x <- c(2,3,4,5,6)
# Find the median.
median.result <- median(x)
print(median.result)

Example: Calculating Median from Dataframe


Input:
df <- data.frame (
Training = c("Strength", "Stamina", "Bulky", "Lean", "Athlete", "Boxer"),
Pulse = c(100, 100, 120, 100, 120, 112),
Duration = c(40, 35, 50, 34, 50, 42)
)
df
# Compute the median value
median = median(df$Duration)
paste("Median of Duration is")
print(median)

Output:
Training Pulse Duration
1 Strength 100 40
2 Stamina 100 35
3 Bulky 120 50
4 Lean 100 34
5 Athlete 120 50
6 Boxer 112 42
[1] "Median of Duration is"
[1] 41

Question 2: What is variability? What is the different statistical function for variability in
R.
Answer:
Variability (also known as Statistical Dispersion) is another feature of descriptive statistics.
Measures of central tendency and variability together comprise of descriptive statistics.
Variability shows the spread of a data set around a point.

Example: Suppose, there exist 2 data sets with the same mean value:
A = 4, 4, 5, 6, 6
Mean(A) = 5
B = 1, 1, 5, 9, 9
Mean(B) = 5
So, to differentiate among the two data sets, R offers various measures of variability.
Measures of Variability

46
Following are some of the measures of variability that R offers to differentiate between
data sets:
Variance
Standard Deviation
Range
Mean Deviation
Interquartile Range
Variance
Variance is a measure that shows how far is each value from a particular point, preferably
mean value. Mathematically, it is defined as the average of squared differences from the
mean value.
Example: 1
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Print variance of x
print(var(x))

Standard Deviation
Standard deviation in statistics measures the spread ness of data values with respect to
mean and mathematically, is calculated as square root of variance.
In R language, there is no standard built-in function to calculate the standard deviation of a
data set. So, modifying the code to find the standard deviation of data set.
Example: 2
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Standard deviation
d <- sqrt(var(x))
# Print standard deviation of x
print(d)

Range
Range is the difference between maximum and minimum value of a data set. In R language,
max() and min() is used to find the same, unlike range() function that returns the minimum
and maximum value of data set.
Example: 3
Input:
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# range() function output
print(range(x))
# Using max() and min() function
# to calculate the range of data set
print(max(x)-min(x))
Output:
[1] 5 16
[1] 11

47
Mean Deviation
Mean deviation is a measure calculated by taking an average of the arithmetic mean of the
absolute difference of each value from the central value. Central value can be mean,
median, or mode.
Example: 4
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Mean deviation
md <- sum(abs(x-mean(x)))/length(x)
# Print mean deviation
print(md)

Interquartile Range
Interquartile Range is based on splitting a data set into parts called as quartiles. There are 3
quartile values (Q1, Q2, Q3) that divide the whole data set into 4 equal parts. Q2 specifies
the median of the whole data set.
Mathematically, the interquartile range is depicted as:
IQR = Q3 – Q1
where,
Q3 specifies the median of n largest values
Q1 specifies the median of n smallest values
Example: 5
# Defining vector
x <- c(5, 5, 8, 12, 15, 16)
# Print Interquartile range
print(IQR(x))

Question 3: Explain correlation in R?


Answer: Correlation is a statistical measure that indicates how strongly two variables are
related. It involves the relationship between multiple variables as well. For instance, if one is
interested to know whether there is a relationship between the heights of fathers and sons,
a correlation coefficient can be calculated to answer this question. Generally, it lies between
-1 and +1. It is a scaled version of covariance and provides the direction and strength of a
relationship.
R Language provides two methods to calculate the pearson correlation coefficient. By using
the functions cor() or cor.test() it can be calculated. It can be noted that cor() computes the
correlation coefficient whereas cor.test() computes the test for association or correlation
between paired samples. It returns both the correlation coefficient and the significance
level(or p-value) of the correlation.

Example: 1 Using cor() method


Input:
# R program to illustrate
# pearson Correlation Testing
# Using cor()

48
# Taking two numeric
# Vectors with same length
x = c(1, 2, 3, 4, 5, 6, 7)
y = c(1, 3, 6, 2, 7, 4, 5)
# Calculating
# Correlation coefficient
# Using cor() method
result = cor(x, y, method = "pearson")
# Print the result
cat("Pearson correlation coefficient is:", result)
Output:
Pearson correlation coefficient is: 0.5357143

Example: 2 Using cor.test() method


Input:
# R program to illustrate
# pearson Correlation Testing
# Using cor.test()
# Taking two numeric
# Vectors with same length
x = c(1, 2, 3, 4, 5, 6, 7)
y = c(1, 3, 6, 2, 7, 4, 5)
# Calculating
# Correlation coefficient
# Using cor.test() method
result = cor.test(x, y, method = "pearson")
# Print the result
print(result)
Output:
Pearson's product-moment correlation

data: x and y
t = 1.4186, df = 5, p-value = 0.2152
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3643187 0.9183058
sample estimates:
cor
0.5357143

Question 4: Explain skewness and its types.

Answer: In statistics, skewness and kurtosis are the measures which tell about the shape of
the data distribution or simply, both are numerical methods to analyze the shape of data set
unlike, plotting graphs and histograms which are graphical methods. These are normality
tests to check the irregularity and asymmetry of the distribution. To calculate skewness and
kurtosis in R language, moments package is required.

49
Skewness
Skewness is a statistical numerical method to measure the asymmetry of the distribution or
data set. It tells about the position of the majority of data values in the distribution around
the mean value.
There exist 3 types of skewness values on the basis of which asymmetry of the graph is
decided. These are as follows:
Positive Skew
If the coefficient of skewness is greater than 0 i.e. \gamma_{1}>0 , then
the graph is said to be positively skewed with the majority of data values
less than mean. Most of the values are concentrated on the left side of the
graph
Zero Skewness or Symmetric
If the coefficient of skewness is equal to 0 or approximately close to 0 i.e.
\gamma_{1}=0 , then the graph is said to be symmetric and data is
normally distributed.

Negatively skewed
If the coefficient of skewness is less than 0 i.e. \gamma_{1}<0 , then the
graph is said to be negatively skewed with the majority of data values greater
than mean. Most of the values are concentrated on the right side of the
graph.

Question 5: Explain kurtosis and its types.


Answer: Kurtosis
Kurtosis is a numerical method in statistics that measures the sharpness of the peak in the
data distribution.
There exist 3 types of Kurtosis values on the basis of which sharpness of the peak is
measured. These are as follows:
Platykurtic
If the coefficient of kurtosis is less than 3 i.e. \gamma_{2}<3 , then the data distribution is
platykurtic. Being platykurtic doesn’t mean that the graph is flat-topped.

50
Mesokurtic
If the coefficient of kurtosis is equal to 3 or approximately close to 3 i.e. \gamma_{2}=3 ,
then the data distribution is mesokurtic. For normal distribution, kurtosis value is
approximately equal to 3.

Leptokurtic
If the coefficient of kurtosis is greater than 3 i.e. \gamma_{1}>3 , then the data distribution
is leptokurtic and shows a sharp peak on the graph.

51

You might also like