Cda Lab
Cda Lab
Cda Lab
Theory:
R is a popular programming language used for statistical computing and graphical presentation.
Its most common use is to analyze and visualize data.
Why Use R?
● It is a great resource for data analysis, data visualization, data science and machine
learning
● It provides many statistical techniques (such as statistical tests, classification, clustering
and data reduction)
● It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
● It works on different platforms (Windows, Mac, Linux)
● It is open-source and free
● It has a large community support
● It has many packages (libraries of functions) that can be used to solve different problems
How to Install R
To install R, go to https://cloud.r-project.org/ and download the latest version of R for Windows,
Mac or Linux.
When you have downloaded and installed R, you can run R on your computer.
Syntax
To output text in R, use single or double quotes:
Example
"Hello World!"
Comments
Comments starts with a #. When executing code, R will ignore anything that starts with #.
This example uses a comment before a line of code:
Example
# This is a comment
"Hello World!"
Creating Variables in R
Variables are containers for storing data values.R does not have a command for declaring a
variable. A variable is created the moment you first assign a value to it. To assign a value to a
variable, use the <- sign. To output (or print) the variable value, just type the variable name:
Example
name <- "John"
age <- 40
However, R does have a print() function available if you want to use it. This might be useful if
you are familiar with other programming languages, such as Python, which often use a print()
function to output variables.
Example
name <- "John Doe"
Numbers
There are three number types in R:
● numeric
● integer
● complex
Variables of number types are created when you assign a value to them:
Strings
Strings are used for storing text.
A string is surrounded by either single quotation marks, or double quotation marks:
"hello" is the same as 'hello':
Booleans (Logical Values)
In programming, you often need to know if an expression is true or false.
You can evaluate any expression in R, and get one of two answers, TRUE or FALSE.
When you compare two values, the expression is evaluated and R returns the logical answer:
Example
10 > 9 # TRUE because 10 is greater than 9
10 == 9 # FALSE because 10 is not equal to 9
10 < 9 # FALSE because 10 is greater than 9
R Arithmetic Operators
Arithmetic operators are used with numeric values to perform common mathematical operations:
Operator Name Example
+ Addition x+y
- Subtraction x-y
* Multiplication x*y
/ Division x/y
^ Exponent x^y
R Assignment Operators
Assignment operators are used to assign values to variables:
Example
my_var <- 3
my_var <<- 3
3 -> my_var
3 ->> my_var
R Comparison Operators
Comparison operators are used to compare two values:
Operator Name Example
== Equal x == y
!= Not equal x != y
R Logical Operators
Logical operators are used to combine conditional statements:
Operator Description
& Element-wise Logical AND operator. It returns TRUE if both elements are
TRUE
&& Logical AND operator - Returns TRUE if both statements are TRUE
R Miscellaneous Operators
Miscellaneous operators are used to manipulate data:
Operator Description Example
The if Statement
An "if statement" is written with the if keyword, and it is used to specify a block of code to be
executed if a condition is TRUE:
Example
a <- 33
b <- 200
if (b > a) {
print("b is greater than a")
}
In this example we use two variables, a and b, which are used as a part of the if statement to
test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33,
and so we print to screen that "b is greater than a".
R uses curly brackets { } to define the scope in the code.
Else If
The else if keyword is R's way of saying "if the previous conditions were not true, then try this
condition":
Example
a <- 33
b <- 33
if (b > a) {
print("b is greater than a")
} else if (a == b) {
print ("a and b are equal")
}
In this example a is equal to b, so the first condition is not true, but the else if condition is true,
so we print to screen that "a and b are equal".
You can use as many else if statements as you want in R.
If Else
The else keyword catches anything which isn't caught by the preceding conditions:
Example
a <- 200
b <- 33
if (b > a) {
print("b is greater than a")
} else if (a == b) {
print("a and b are equal")
} else {
print("a is greater than b")
}
In this example, a is greater than b, so the first condition is not true, also the else if condition is
not true, so we go to the else condition and print to screen that "a is greater than b".
You can also use else without else if:
Example
a <- 200
b <- 33
if (b > a) {
print("b is greater than a")
} else {
print("b is not greater than a")
}
Loops
Loops can execute a block of code as long as a specified condition is reached.
Loops are handy because they save time, reduce errors, and they make code more readable.
R has two loop commands:
● while loops
● for loops
R While Loops
With the while loop we can execute a set of statements as long as a condition is TRUE:
Example
Print i as long as i is less than 6:
i <- 1
while (i < 6) {
print(i)
i <- i + 1
}
For Loops
A for loop is used for iterating over a sequence:
Example
for (x in 1:10) {
print(x)
}
R Functions
A function is a block of code which only runs when it is called.
You can pass data, known as parameters, into a function.
A function can return data as a result.
Creating a Function
To create a function, use the function() keyword:
Example
my_function <- function() { # create a function with the name my_function
print("Hello World!")
}
Call a Function
To call a function, use the function name followed by parenthesis, like my_function():
Example
my_function <- function() {
print("Hello World!")
}
my_function("Peter")
my_function("Lois")
my_function("Stewie")
Parameters or Arguments?
The terms "parameter" and "argument" can be used for the same thing: information that are
passed into a function.
From a function's perspective:
A parameter is the variable listed inside the parentheses in the function definition.
An argument is the value that is sent to the function when it is called.
Number of Arguments
By default, a function must be called with the correct number of arguments. Meaning that if your
function expects 2 arguments, you have to call the function with 2 arguments, not more, and not
less:
Example
This function expects 2 arguments, and gets 2 arguments:
my_function <- function(fname, lname) {
paste(fname, lname)
}
my_function("Peter", "Griffin")
my_function("Sweden")
my_function("India")
my_function() # will get the default value, which is Norway
my_function("USA")
Return Values
To let a function return a result, use the return() function:
Example
my_function <- function(x) {
return (5 * x)
}
print(my_function(3))
print(my_function(5))
print(my_function(9))
Code
first_str<-"Hello World"
first_str
#identifiers and constants
typeof(3)
typeof(3L)
typeof(3i)
typeof("Apple")
typeof('mango')
LETTERS
letters
pi
month.name
month.abb
pi<-20
pi
#Variable assigning
#assignment using equal operator
variable.1=c(1,2,3)
#assignment using leftward operator
variable.2 <- c("Lotus","Rose")
#assignment using rightward operator.
c(FALSE,1) -> variable.3
variable.1
cat("variable.1 is ",variable.1,"\n")
cat("variable.2 is ",variable.2,"\n")
cat("variable.3 is ",variable.3,"\n")
#Operators
a<-c(10,20,30,40)
b<-c(2,2,4,3)
cat("Sum=",(a+b),"\n")
cat("Difference=",(a-b),"\n")
cat("Product=",(a*b),"\n")
cat("Quotient=",(a/b),"\n")
cat("Remainder=",(a%%b),"\n")
cat("Integer Division=",(a%/%b),"\n")
cat("Exponentiation=",(a^b),"\n")
#Relational Operations
a<-c(10,20,30,40)
b<-c(25,2,30,3)
cat("Less than",b,(a<b),"\n")
cat("Greater than",b,(a>b),"\n")
cat("Less than or equal to",b,(a<=b),"\n")
cat("Greater than or equal to",b,(a>=b),"\n")
cat("Equal to",b,(a==b),"\n")
cat("Not equal to",b,(a!=b),"\n")
#Logical Operators
a<-c(0,20,30,56)
b<-c(2,2,30,0)
cat("Logical NOT",(!a),"\n")
cat("Element-wise Logical AND",b,(a&b),"\n")
cat("Logical AND",b,(a&&b),"\n")
cat("Element-wise Logical OR",b,(a|b),"\n")
cat("Logical OR",b,(a||b),"\n")
#leftwards Assignment
var.a=c(0,20,TRUE)
var.b<-c(0,20,TRUE)
var.c<<-c(0,20,TRUE)
var.a
var.b
var.c
#Rightwards Assignment
c(1,2,TRUE)->v1
c(1,2,TRUE)->>v2
v1
v2
Output
[1] "Hello World"
[1] "double"
[1] "integer"
[1] "complex"
[1] "character"
[1] "character"
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
[1] 3.141593
[1] "January" "February" "March" "April" "May" "June"
[7] "July" "August" "September" "October" "November" "December"
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[1] 20
[1] 1 2 3
variable.1 is 1 2 3
variable.2 is Lotus Rose
variable.3 is 0 1
The class of var_1 is integer
The class of var_1 is complex
The class of var_1 is numeric
The class of var_1 is character
[1] "first_str" "pi" "var_1" "variable.1" "variable.2"
[6] "variable.3"
[1] "var_1" "variable.1" "variable.2" "variable.3"
Sum= 12 22 34 43
Difference= 8 18 26 37
Product= 20 40 120 120
Quotient= 5 10 7.5 13.33333
Remainder= 0 0 2 1
Integer Division= 5 10 7 13
Exponentiation= 100 400 810000 64000
Less than 25 2 30 3 TRUE FALSE FALSE FALSE
Greater than 25 2 30 3 FALSE TRUE FALSE TRUE
Less than or equal to 25 2 30 3 TRUE FALSE TRUE FALSE
Greater than or equal to 25 2 30 3 FALSE TRUE TRUE TRUE
Equal to 25 2 30 3 FALSE FALSE TRUE FALSE
Not equal to 25 2 30 3 TRUE TRUE FALSE TRUE
Logical NOT TRUE FALSE FALSE FALSE
Element-wise Logical AND 2 2 30 0 FALSE TRUE TRUE FALSE
Logical AND 2 2 30 0 FALSE
Element-wise Logical OR 2 2 30 0 TRUE TRUE TRUE TRUE
Logical OR 2 2 30 0 TRUE
[1] 0 20 1
[1] 0 20 1
[1] 0 20 1
[1] 1 2 1
[1] 1 2 1
Conclusion:
Thus we successfully studied basic concepts of R programming.
Exp 2
Aim: To understand vectors, list, matrices, arrays, factors and data frames in r
programming.
Theory:
To understand vectors, lists, matrices, arrays, factors, and data frames in R programming, we
can break down each concept with detailed explanations and small functions.
1. Vectors
Theory:
- A vector is a one-dimensional array that contains elements of the same type, such as numbers
or characters.
- Vectors are the most basic data structure in R.
-Creating Vectors:
numeric_vector <- c(1, 2, 3, 4, 5) # Numeric vector
char_vector <- c("a", "b", "c") # Character vector
- Accessing Elements:
numeric_vector[1] # Access the first element (1)
- Vector Length:
length(numeric_vector) # Get the number of elements (5)
- Vector Operations:
numeric_vector * 2 # Multiply each element by 2
2. Lists
Theory:
- A list is an R object that can contain elements of different types, including vectors, other lists,
or even data frames.
- Creating Lists:
my_list <- list(name = "John", age = 30, scores = c(85, 90, 78))
- Accessing Elements:
my_list$name # Access the element "name"
my_list[[2]] # Access the second element (30)
-Adding Elements:
my_list$city <- "New York"
3. Matrices
Theory:
- A matrix is a two-dimensional array where all elements must be of the same type.
- Matrices are useful for mathematical operations.
- Creating Matrices:
matrix_data <- matrix(1:9, nrow = 3, ncol = 3)
- Accessing Elements:
matrix_data[2, 3] # Access element in 2nd row, 3rd column
- Matrix Operations:
t(matrix_data) # Transpose of the matrix
4. Arrays
Theory:
- An array is a multi-dimensional extension of matrices. It can have more than two dimensions.
- Creating Arrays:
array_data <- array(1:12, dim = c(2, 3, 2)) # 3D array
- Accessing Elements:
array_data[1, 2, 2] # Access element in first matrix, second row, second column
5. Factors
Theory:
- Factors are used to represent categorical data. They can store both strings and integers with
associated labels.
- Creating Factors:
factor_data <- factor(c("male", "female", "female", "male"))
- Levels of Factors:
levels(factor_data) # Get levels ("male", "female")
- Converting Factors:
as.character(factor_data) # Convert factor to character
6. Data Frames
Theory:
- A data frame is a table-like structure where each column can be of different types (numeric,
character, etc.). It is similar to a matrix but allows columns of different types.
- Accessing Data:
df$name # Access the "name" column
df[1, ] # Access the first row
- Adding Columns:
df$city <- c("New York", "Los Angeles")
In R, vectors, lists, matrices, arrays, factors, and data frames are fundamental structures for
handling and analyzing data. Each has specific use cases and functions that allow manipulation
and retrieval of data in various forms. Understanding these structures and their associated
functions is crucial for effective data analysis in R.
Code:
Output:
Conclusion:
Thus we successfully studied vectors, list, matrices, arrays, factors and data frames in r
programming.
EXP 3
Aim: Implement random sampling with and without replacement and stratified sampling
in R with reproducible sample.
Theory:
Sampling is the practice of selecting an individual group from a population to study the whole
population.
● Simple random sampling: In simple random sampling, the researcher selects the
participants randomly. There are a number of data analytics tools like random number
generators and random number tables used that are based entirely on chance.
Theory:
Data visualization is the technique used to deliver insights in data using visual cues such as
graphs, charts, maps, and many others. This is useful as it helps in intuitive and easy
understanding of the large quantities of data and thereby make better decisions regarding it. R is
a language that is designed for statistical computing, graphical data analysis, and scientific
research. It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages.
● Bar plot: There are two types of bar plots- horizontal and vertical which represent data
points as horizontal or vertical bars of certain lengths proportional to the value of the data
item. They are generally used for continuous and categorical variable plotting.
● Histogram: A histogram is like a bar chart as it uses bars of varying height to represent
data distribution.
● Box plot: The statistical summary of the given data is presented graphically using a
boxplot. A box plot depicts information like the minimum and maximum data point, the
median value, first and third quartile, and interquartile range.
● Scatter plot: A scatter plot is composed of many points on a Cartesian plane. Each point
denotes the value taken by two parameters and helps us easily identify the relationship
between them.
● Map visualization: Here we are using maps package to visualize and display
geographical maps using an R programming language.
● 3D graph: Here we will use preps() function, This function is used to create 3D surfaces
in perspective view. This function will draw perspective plots of a surface over the x-y
plane.
Code:
Conclusion:
Thus we successfully implemented data visualization in R programming.