Cda Lab

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

EXP 1: Basics of R programming.

Aim: To study basics of R programming.

Theory:
R is a popular programming language used for statistical computing and graphical presentation.
Its most common use is to analyze and visualize data.

Why Use R?
● It is a great resource for data analysis, data visualization, data science and machine
learning
● It provides many statistical techniques (such as statistical tests, classification, clustering
and data reduction)
● It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
● It works on different platforms (Windows, Mac, Linux)
● It is open-source and free
● It has a large community support
● It has many packages (libraries of functions) that can be used to solve different problems

How to Install R
To install R, go to https://cloud.r-project.org/ and download the latest version of R for Windows,
Mac or Linux.
When you have downloaded and installed R, you can run R on your computer.

Syntax
To output text in R, use single or double quotes:
Example
"Hello World!"

To output numbers, just type the number (without quotes):


Example
5
10
25

To do simple calculations, add numbers together:


Example
5+5

Comments
Comments starts with a #. When executing code, R will ignore anything that starts with #.
This example uses a comment before a line of code:
Example
# This is a comment
"Hello World!"

Creating Variables in R
Variables are containers for storing data values.R does not have a command for declaring a
variable. A variable is created the moment you first assign a value to it. To assign a value to a
variable, use the <- sign. To output (or print) the variable value, just type the variable name:
Example
name <- "John"
age <- 40

name # output "John"


age # output 40
Compared to many other programming languages, you do not have to use a function to
print/output variables in R. You can just type the name of the variable.

However, R does have a print() function available if you want to use it. This might be useful if
you are familiar with other programming languages, such as Python, which often use a print()
function to output variables.
Example
name <- "John Doe"

print(name) # print the value of the name variable

Basic Data Types


Basic data types in R can be divided into the following types:
● numeric - (10.5, 55, 787)
● integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
● complex - (9 + 3i, where "i" is the imaginary part)
● character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
● logical (a.k.a. boolean) - (TRUE or FALSE)
We can use the class() function to check the data type of a variable

Numbers
There are three number types in R:
● numeric
● integer
● complex
Variables of number types are created when you assign a value to them:

Strings
Strings are used for storing text.
A string is surrounded by either single quotation marks, or double quotation marks:
"hello" is the same as 'hello':
Booleans (Logical Values)
In programming, you often need to know if an expression is true or false.
You can evaluate any expression in R, and get one of two answers, TRUE or FALSE.
When you compare two values, the expression is evaluated and R returns the logical answer:
Example
10 > 9 # TRUE because 10 is greater than 9
10 == 9 # FALSE because 10 is not equal to 9
10 < 9 # FALSE because 10 is greater than 9

R divides the operators in the following groups:


● Arithmetic operators
● Assignment operators
● Comparison operators
● Logical operators
● Miscellaneous operators

R Arithmetic Operators
Arithmetic operators are used with numeric values to perform common mathematical operations:
Operator Name Example

+ Addition x+y

- Subtraction x-y

* Multiplication x*y

/ Division x/y

^ Exponent x^y

%% Modulus (Remainder from x %% y


division)

%/% Integer Division x%/%y

R Assignment Operators
Assignment operators are used to assign values to variables:
Example
my_var <- 3

my_var <<- 3

3 -> my_var
3 ->> my_var

my_var # print my_var

R Comparison Operators
Comparison operators are used to compare two values:
Operator Name Example

== Equal x == y

!= Not equal x != y

> Greater than x>y

< Less than x<y

>= Greater than or equal to x >= y

<= Less than or equal to x <= y

R Logical Operators
Logical operators are used to combine conditional statements:
Operator Description

& Element-wise Logical AND operator. It returns TRUE if both elements are
TRUE

&& Logical AND operator - Returns TRUE if both statements are TRUE

| Elementwise- Logical OR operator. It returns TRUE if one of the statement is


TRUE

|| Logical OR operator. It returns TRUE if one of the statement is TRUE.

! Logical NOT - returns FALSE if statement is TRUE

R Miscellaneous Operators
Miscellaneous operators are used to manipulate data:
Operator Description Example

: Creates a series of numbers in a sequence x <- 1:10

%in% Find out if an element belongs to a vector x %in% y


%*% Matrix Multiplication x <- Matrix1 %*% Matrix2

The if Statement
An "if statement" is written with the if keyword, and it is used to specify a block of code to be
executed if a condition is TRUE:
Example
a <- 33
b <- 200

if (b > a) {
print("b is greater than a")
}

In this example we use two variables, a and b, which are used as a part of the if statement to
test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33,
and so we print to screen that "b is greater than a".
R uses curly brackets { } to define the scope in the code.

Else If
The else if keyword is R's way of saying "if the previous conditions were not true, then try this
condition":
Example
a <- 33
b <- 33

if (b > a) {
print("b is greater than a")
} else if (a == b) {
print ("a and b are equal")
}

In this example a is equal to b, so the first condition is not true, but the else if condition is true,
so we print to screen that "a and b are equal".
You can use as many else if statements as you want in R.

If Else
The else keyword catches anything which isn't caught by the preceding conditions:
Example
a <- 200
b <- 33

if (b > a) {
print("b is greater than a")
} else if (a == b) {
print("a and b are equal")
} else {
print("a is greater than b")
}

In this example, a is greater than b, so the first condition is not true, also the else if condition is
not true, so we go to the else condition and print to screen that "a is greater than b".
You can also use else without else if:
Example
a <- 200
b <- 33

if (b > a) {
print("b is greater than a")
} else {
print("b is not greater than a")
}

Loops
Loops can execute a block of code as long as a specified condition is reached.
Loops are handy because they save time, reduce errors, and they make code more readable.
R has two loop commands:
● while loops
● for loops

R While Loops
With the while loop we can execute a set of statements as long as a condition is TRUE:
Example
Print i as long as i is less than 6:
i <- 1
while (i < 6) {
print(i)
i <- i + 1
}

For Loops
A for loop is used for iterating over a sequence:
Example
for (x in 1:10) {
print(x)
}
R Functions
A function is a block of code which only runs when it is called.
You can pass data, known as parameters, into a function.
A function can return data as a result.
Creating a Function
To create a function, use the function() keyword:
Example
my_function <- function() { # create a function with the name my_function
print("Hello World!")
}

Call a Function
To call a function, use the function name followed by parenthesis, like my_function():
Example
my_function <- function() {
print("Hello World!")
}

my_function() # call the function named my_function


Arguments
Information can be passed into functions as arguments.
Arguments are specified after the function name, inside the parentheses. You can add as many
arguments as you want, just separate them with a comma.
The following example has a function with one argument (fname). When the function is called,
we pass along a first name, which is used inside the function to print the full name:
Example
my_function <- function(fname) {
paste(fname, "Griffin")
}

my_function("Peter")
my_function("Lois")
my_function("Stewie")
Parameters or Arguments?
The terms "parameter" and "argument" can be used for the same thing: information that are
passed into a function.
From a function's perspective:
A parameter is the variable listed inside the parentheses in the function definition.
An argument is the value that is sent to the function when it is called.

Number of Arguments
By default, a function must be called with the correct number of arguments. Meaning that if your
function expects 2 arguments, you have to call the function with 2 arguments, not more, and not
less:
Example
This function expects 2 arguments, and gets 2 arguments:
my_function <- function(fname, lname) {
paste(fname, lname)
}

my_function("Peter", "Griffin")

Default Parameter Value


The following example shows how to use a default parameter value.
If we call the function without an argument, it uses the default value:
Example
my_function <- function(country = "Norway") {
paste("I am from", country)
}

my_function("Sweden")
my_function("India")
my_function() # will get the default value, which is Norway
my_function("USA")

Return Values
To let a function return a result, use the return() function:
Example
my_function <- function(x) {
return (5 * x)
}

print(my_function(3))
print(my_function(5))
print(my_function(9))

The output of the code above will be:


[1] 15
[1] 25
[1] 45

Code
first_str<-"Hello World"
first_str
#identifiers and constants
typeof(3)
typeof(3L)
typeof(3i)

typeof("Apple")
typeof('mango')
LETTERS
letters
pi
month.name
month.abb
pi<-20
pi

#Variable assigning
#assignment using equal operator
variable.1=c(1,2,3)
#assignment using leftward operator
variable.2 <- c("Lotus","Rose")
#assignment using rightward operator.
c(FALSE,1) -> variable.3
variable.1
cat("variable.1 is ",variable.1,"\n")
cat("variable.2 is ",variable.2,"\n")
cat("variable.3 is ",variable.3,"\n")

#identifying the class of variable


var_1 <- 10L
cat("The class of var_1 is ",class(var_1), "\n")
var_1 <- 20+10i
cat("The class of var_1 is ",class(var_1), "\n")
var_1 <- 90.86
cat("The class of var_1 is ",class(var_1), "\n")
var_1 <- "Good Morning"
cat("The class of var_1 is ",class(var_1), "\n")

#find all variables in workspace


print(ls())

#match variable names with patterns using ls() function


print(ls(pattern="v"))
#delete variables using rm() function
rm(variable.3)

#Operators
a<-c(10,20,30,40)
b<-c(2,2,4,3)
cat("Sum=",(a+b),"\n")
cat("Difference=",(a-b),"\n")
cat("Product=",(a*b),"\n")
cat("Quotient=",(a/b),"\n")
cat("Remainder=",(a%%b),"\n")
cat("Integer Division=",(a%/%b),"\n")
cat("Exponentiation=",(a^b),"\n")

#Relational Operations
a<-c(10,20,30,40)
b<-c(25,2,30,3)
cat("Less than",b,(a<b),"\n")
cat("Greater than",b,(a>b),"\n")
cat("Less than or equal to",b,(a<=b),"\n")
cat("Greater than or equal to",b,(a>=b),"\n")
cat("Equal to",b,(a==b),"\n")
cat("Not equal to",b,(a!=b),"\n")

#Logical Operators
a<-c(0,20,30,56)
b<-c(2,2,30,0)
cat("Logical NOT",(!a),"\n")
cat("Element-wise Logical AND",b,(a&b),"\n")
cat("Logical AND",b,(a&&b),"\n")
cat("Element-wise Logical OR",b,(a|b),"\n")
cat("Logical OR",b,(a||b),"\n")

#leftwards Assignment
var.a=c(0,20,TRUE)
var.b<-c(0,20,TRUE)
var.c<<-c(0,20,TRUE)
var.a
var.b
var.c

#Rightwards Assignment
c(1,2,TRUE)->v1
c(1,2,TRUE)->>v2
v1
v2

Output
[1] "Hello World"
[1] "double"
[1] "integer"
[1] "complex"
[1] "character"
[1] "character"
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
[1] 3.141593
[1] "January" "February" "March" "April" "May" "June"
[7] "July" "August" "September" "October" "November" "December"
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[1] 20
[1] 1 2 3
variable.1 is 1 2 3
variable.2 is Lotus Rose
variable.3 is 0 1
The class of var_1 is integer
The class of var_1 is complex
The class of var_1 is numeric
The class of var_1 is character
[1] "first_str" "pi" "var_1" "variable.1" "variable.2"
[6] "variable.3"
[1] "var_1" "variable.1" "variable.2" "variable.3"
Sum= 12 22 34 43
Difference= 8 18 26 37
Product= 20 40 120 120
Quotient= 5 10 7.5 13.33333
Remainder= 0 0 2 1
Integer Division= 5 10 7 13
Exponentiation= 100 400 810000 64000
Less than 25 2 30 3 TRUE FALSE FALSE FALSE
Greater than 25 2 30 3 FALSE TRUE FALSE TRUE
Less than or equal to 25 2 30 3 TRUE FALSE TRUE FALSE
Greater than or equal to 25 2 30 3 FALSE TRUE TRUE TRUE
Equal to 25 2 30 3 FALSE FALSE TRUE FALSE
Not equal to 25 2 30 3 TRUE TRUE FALSE TRUE
Logical NOT TRUE FALSE FALSE FALSE
Element-wise Logical AND 2 2 30 0 FALSE TRUE TRUE FALSE
Logical AND 2 2 30 0 FALSE
Element-wise Logical OR 2 2 30 0 TRUE TRUE TRUE TRUE
Logical OR 2 2 30 0 TRUE
[1] 0 20 1
[1] 0 20 1
[1] 0 20 1
[1] 1 2 1
[1] 1 2 1

[Execution complete with exit code 0]

Conclusion:
Thus we successfully studied basic concepts of R programming.

Exp 2
Aim: To understand vectors, list, matrices, arrays, factors and data frames in r
programming.
Theory:
To understand vectors, lists, matrices, arrays, factors, and data frames in R programming, we
can break down each concept with detailed explanations and small functions.

1. Vectors
Theory:
- A vector is a one-dimensional array that contains elements of the same type, such as numbers
or characters.
- Vectors are the most basic data structure in R.

*Functions and Examples:

-Creating Vectors:
numeric_vector <- c(1, 2, 3, 4, 5) # Numeric vector
char_vector <- c("a", "b", "c") # Character vector

- Accessing Elements:
numeric_vector[1] # Access the first element (1)

- Vector Length:
length(numeric_vector) # Get the number of elements (5)

- Vector Operations:
numeric_vector * 2 # Multiply each element by 2

2. Lists
Theory:
- A list is an R object that can contain elements of different types, including vectors, other lists,
or even data frames.

Functions and Examples:

- Creating Lists:
my_list <- list(name = "John", age = 30, scores = c(85, 90, 78))

- Accessing Elements:
my_list$name # Access the element "name"
my_list[[2]] # Access the second element (30)

-Adding Elements:
my_list$city <- "New York"
3. Matrices
Theory:
- A matrix is a two-dimensional array where all elements must be of the same type.
- Matrices are useful for mathematical operations.

Functions and Examples:

- Creating Matrices:
matrix_data <- matrix(1:9, nrow = 3, ncol = 3)

- Accessing Elements:
matrix_data[2, 3] # Access element in 2nd row, 3rd column

- Matrix Operations:
t(matrix_data) # Transpose of the matrix

4. Arrays
Theory:
- An array is a multi-dimensional extension of matrices. It can have more than two dimensions.

Functions and Examples:

- Creating Arrays:
array_data <- array(1:12, dim = c(2, 3, 2)) # 3D array

- Accessing Elements:
array_data[1, 2, 2] # Access element in first matrix, second row, second column

5. Factors
Theory:
- Factors are used to represent categorical data. They can store both strings and integers with
associated labels.

Functions and Examples:

- Creating Factors:
factor_data <- factor(c("male", "female", "female", "male"))

- Levels of Factors:
levels(factor_data) # Get levels ("male", "female")

- Converting Factors:
as.character(factor_data) # Convert factor to character
6. Data Frames
Theory:
- A data frame is a table-like structure where each column can be of different types (numeric,
character, etc.). It is similar to a matrix but allows columns of different types.

Functions and Examples:

- Creating Data Frames:


df <- data.frame(name = c("John", "Alice"), age = c(30, 25), scores = c(85, 90))

- Accessing Data:
df$name # Access the "name" column
df[1, ] # Access the first row

- Adding Columns:
df$city <- c("New York", "Los Angeles")

In R, vectors, lists, matrices, arrays, factors, and data frames are fundamental structures for
handling and analyzing data. Each has specific use cases and functions that allow manipulation
and retrieval of data in various forms. Understanding these structures and their associated
functions is crucial for effective data analysis in R.

Code:
Output:
Conclusion:
Thus we successfully studied vectors, list, matrices, arrays, factors and data frames in r
programming.
EXP 3

Aim: Implement random sampling with and without replacement and stratified sampling
in R with reproducible sample.

Theory:
Sampling is the practice of selecting an individual group from a population to study the whole
population.

Sampling type comes under two broad categories:

● Probability sampling - Probability sampling allows every member of the population a


chance to get selected. It is mainly used in quantitative research when you want to
produce results representative of the whole population.

● Non-probability sampling - n non-probability sampling, not every individual has a


chance of being included in the sample. This sampling method is easier and cheaper but
also has high risks of sampling bias. It is often used in exploratory and qualitative
research with the aim to develop an initial understanding of the population.

Types of Probability sampling:

● Simple random sampling: In simple random sampling, the researcher selects the
participants randomly. There are a number of data analytics tools like random number
generators and random number tables used that are based entirely on chance.

● Stratified sampling: In stratified sampling, the population is subdivided into subgroups,


called strata, based on some characteristics (age, gender, income, etc.). After forming a
subgroup, you can then use random or systematic sampling to select a sample for each
subgroup. This method allows you to draw more precise conclusions because it ensures
that every subgroup is properly represented. in this a subset of observations are selected
randomly from each group of the observations defined by the value of a stratifying
variable, and once an observation is selected it cannot be selected again.

● Sampling without replacement: It is a method, in which a subset of the observations are


selected randomly, and once an observation is selected it cannot be selected again.

● Sampling with replacement: It is a method, in which a subset of observations are


selected randomly, and an observation may be selected more than once.
Code:
> sample(1:20,10)
[1] 11 16 9 15 18 4 19 13 8 6
> sample(1:6,4, replace=TRUE)
[1] 2 5 2 1
> sample(1:6,4, replace=FALSE)
[1] 3 4 1 2
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
> sample(LETTERS)
[1] "V" "C" "A" "B" "Y" "P" "S" "U" "W" "L" "T" "F" "N" "I" "G" "Z" "D" "M" "O"
[20] "Q" "E" "J" "K" "X" "R" "H"
> sample(LETTERS)
[1] "S" "K" "D" "R" "J" "Y" "E" "M" "P" "X" "O" "L" "I" "G" "B" "Z" "Q" "C" "U"
[20] "V" "F" "N" "W" "H" "T" "A"
> data<-c(1,3,5,6,7,8,9,10,11,12,14)
> sample(x=data,size=5)
[1] 3 12 9 7 1
> data<-c(1,3,5,6,7,8,9,10,11,12,14)
> sample(x=data,size=5, replace=TRUE)
[1] 5 5 1 11 11
> df<-data.frame(x=c(3,5,6,6,8,12,14), y=c(12,6,4,23,25,8,9), z=c(2,7,8,8,15,17,29))
> df
xyz
1 3 12 2
2567
3648
4 6 23 8
5 8 25 15
6 12 8 17
7 14 9 29
> install.packages("dplyr")
Installing package into ‘C:/Users/tulas/AppData/Local/R/win-library/4.4’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
also installing the dependencies ‘fansi’, ‘utf8’, ‘pkgconfig’, ‘withr’, ‘cli’, ‘generics’, ‘glue’
, ‘lifecycle’, ‘magrittr’, ‘pillar’, ‘R6’, ‘rlang’, ‘tibble’, ‘tidyselect’, ‘vctrs’
There is a binary version available but the source version is later:
binary source needs_compilation
withr 3.0.0 3.0.1 FALSE
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/fansi_1.0.6.zip'
Content type 'application/zip' length 323615 bytes (316 KB)
downloaded 316 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/utf8_1.2.4.zip'
Content type 'application/zip' length 150973 bytes (147 KB)
downloaded 147 KB
R Console Page 2
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/pkgconfig_2.0.3.zip'
Content type 'application/zip' length 22762 bytes (22 KB)
downloaded 22 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/cli_3.6.3.zip'
Content type 'application/zip' length 1361491 bytes (1.3 MB)
downloaded 1.3 MB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/generics_0.1.3.zip'
Content type 'application/zip' length 83128 bytes (81 KB)
downloaded 81 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/glue_1.7.0.zip'
Content type 'application/zip' length 163502 bytes (159 KB)
downloaded 159 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/lifecycle_1.0.4.zip'
Content type 'application/zip' length 141079 bytes (137 KB)
downloaded 137 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/magrittr_2.0.3.zip'
Content type 'application/zip' length 229491 bytes (224 KB)
downloaded 224 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/pillar_1.9.0.zip'
Content type 'application/zip' length 663013 bytes (647 KB)
downloaded 647 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/R6_2.5.1.zip'
Content type 'application/zip' length 85019 bytes (83 KB)
downloaded 83 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/rlang_1.1.4.zip'
Content type 'application/zip' length 1621613 bytes (1.5 MB)
downloaded 1.5 MB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/tibble_3.2.1.zip'
Content type 'application/zip' length 695385 bytes (679 KB)
downloaded 679 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/tidyselect_1.2.1.zip'
Content type 'application/zip' length 228228 bytes (222 KB)
downloaded 222 KB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/vctrs_0.6.5.zip'
Content type 'application/zip' length 1361597 bytes (1.3 MB)
downloaded 1.3 MB
trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/dplyr_1.1.4.zip'
Content type 'application/zip' length 1583274 bytes (1.5 MB)
downloaded 1.5 MB
package ‘fansi’ successfully unpacked and MD5 sums checked
package ‘utf8’ successfully unpacked and MD5 sums checked
package ‘pkgconfig’ successfully unpacked and MD5 sums checked
package ‘cli’ successfully unpacked and MD5 sums checked
package ‘generics’ successfully unpacked and MD5 sums checked
package ‘glue’ successfully unpacked and MD5 sums checked
package ‘lifecycle’ successfully unpacked and MD5 sums checked
package ‘magrittr’ successfully unpacked and MD5 sums checked
package ‘pillar’ successfully unpacked and MD5 sums checked
package ‘R6’ successfully unpacked and MD5 sums checked
package ‘rlang’ successfully unpacked and MD5 sums checked
package ‘tibble’ successfully unpacked and MD5 sums checked
package ‘tidyselect’ successfully unpacked and MD5 sums checked
package ‘vctrs’ successfully unpacked and MD5 sums checked
package ‘dplyr’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\tulas\AppData\Local\Temp\RtmpiUu1X1\downloaded_packages
R Console Page 3
installing the source package ‘withr’
trying URL 'https://cloud.r-project.org/src/contrib/withr_3.0.1.tar.gz'
Content type 'application/x-gzip' length 103375 bytes (100 KB)
downloaded 100 KB
* installing *source* package 'withr' ...
** package 'withr' successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (withr)
The downloaded source packages are in
‘C:\Users\tulas\AppData\Local\Temp\RtmpiUu1X1\downloaded_packages’
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
> set.seed(1)
> df<-data.frame(grade=rep(c('Freshman','Sophomore','Junior','Senior'), each=100),
gpa=rnorm(400,
mean=85, sd=3))
> head(df)
grade gpa
1 Freshman 83.12064
2 Freshman 85.55093
3 Freshman 82.49311
4 Freshman 89.78584
5 Freshman 85.98852
6 Freshman 82.53859
> strat_sample<-df%>%
+ group_by(grade)%>%
+ sample_n(size=10)
> table(strat_sample$grade)
Freshman Junior Senior Sophomore
10 10 10 10
> library(dplyr)
> strat_sample<-df%>%
+ group_by(grade)%>%
+ sample_frac(size=.15)
> table(strat_sample$grade)
Freshman Junior Senior Sophomore
15 15 15 15

Conclusion: Thus we successfully implemented random sampling.


Exp 4

Aim: To implement data visualization in R.

Theory:
Data visualization is the technique used to deliver insights in data using visual cues such as
graphs, charts, maps, and many others. This is useful as it helps in intuitive and easy
understanding of the large quantities of data and thereby make better decisions regarding it. R is
a language that is designed for statistical computing, graphical data analysis, and scientific
research. It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages.

Types of Data Visualizations:

● Bar plot: There are two types of bar plots- horizontal and vertical which represent data
points as horizontal or vertical bars of certain lengths proportional to the value of the data
item. They are generally used for continuous and categorical variable plotting.

● Histogram: A histogram is like a bar chart as it uses bars of varying height to represent
data distribution.

● Box plot: The statistical summary of the given data is presented graphically using a
boxplot. A box plot depicts information like the minimum and maximum data point, the
median value, first and third quartile, and interquartile range.

● Scatter plot: A scatter plot is composed of many points on a Cartesian plane. Each point
denotes the value taken by two parameters and helps us easily identify the relationship
between them.

● Heatmap: Heatmap is defined as a graphical representation of data using colors to


visualize the value of the matrix.

● Map visualization: Here we are using maps package to visualize and display
geographical maps using an R programming language.

● 3D graph: Here we will use preps() function, This function is used to create 3D surfaces
in perspective view. This function will draw perspective plots of a surface over the x-y
plane.
Code:
Conclusion:
Thus we successfully implemented data visualization in R programming.

You might also like