|
STATISTICAL COMPUTING & R PROGRAMMING
UNIT 1
Introduction
-Ris a programming language and also a software environment for statistical
computing and data analysis
-Rwas developed by Ross Ihaka and Robert Gentleman at the university of Auckland,
New Zealand.
= Ris an open-source programming language end itis available on widely used
platforms e.g. Windows, Linux, and Mac.
Ris an interpreted language that supports both procedural programming and object-
oriented programming.
Why R Programming Language?
% — R programming is used as a leading tool for machine learning, statistics, and
data analysis. Objects, functions, and packages can easily be created by R
% It's a platform-independent language. This means it can be applied to all
operating system
It's an open-source free language. That means anyone can install it in any
organization without purchasing a license.
%* _R programming language is not only a statistic package but also allows us to
integrate with other languages (C, C++). Thus, you can easily interact with many
data sources and statistical packages.
+ The R programming language has a vast community of users and it’s growing
day by day.
% Ris currently one of the most requested programming languages in the Data
Science job market that makes it the hottest trend nowadays.
Features of R Programming Language
Statistical Features of R:
* Basic Statistics: The most common basic statistics terms are the mean, mode,
and median. These are all known as “Measures of Central Tendency.” So using
the R language we can measure central tendency very easily,
|
L|
|
% Static graphics: R is rich with facilities for creating and developing interesting
static graphics. R contains functionality for many plot types including graphic
maps, mosaic plots, biplots, and the list goes on.
* Probability distributions: Probability distributions play a vital role in statistics
and by using R we can easily handle various types of probability distribution
such as Binomial Distribution, Normal Distribution, Chi-squared Distribution and
many more.
%* Data analysis: It provides a large, coherent and integrated collection of tools for
data analysis.
Programming Features of R:
+ R Packages: One of the major features of R is it has a wide availability of
libraries. R has CRAN(Comprehensive R Archive Network), which is a repository
holding more than 10, 0000 packages.
Distributed Computing: Distributed computing is a model in which components
of a software system are shared among multiple computers to improve efficiency
and performance.
2.1.Arithmetic
In R, standard mathematical rules apply throughout and follow the usual left-toright
order of operations: parentheses, exponents, multiplication, division, addition,
subtraction (PEMDAS)
Re 14/645,
(11 7.333333
R>14/(6+5)
(1)1.272727
R>3*2
fe
Logarithms and Exponentials
When supplied a given number x and a value referred to as a base, the log- arithm
calculates the power to which you must raise the base to get to x.
For example, the logarithm of x = 243 to base 3 (written mathematically as log3 243) is
5,because35 = 243. InR, thelog transformation is achieved with the lag function
R> log(x=243 base=3)
fs
“Both x and the base must be positive.
“The log of any number x when the base is equal to x is 1
L|
“The log of x = 1 is always 0, regardless of the base.
Euler's number gives rise to the exponential function, defined as ¢ raised to the power of
x, where x can be any number (negative, zero, or positive). The exponential function, f (x)
= ex, is written as exp(x) and represents the inverse of the natural log such that exp(loge
loge exp(x) = x.
Roexo(x=3)
[1] 20.08554
R>log(x=20.08554)
UK
E-Notation
The e-notation is typical to most programming languages—and even many desktop
calculators—to allow easier interpreta- tion of extreme values. In e-notation, any number
x can be expressed as xey, which represents exactly x10y . Consider the number 2, 342,
151,012, 900
For example, be represented as follows:
+2.3421510129e12, which is equivalent to writing 2.3421510129 x 1012
but standard enotation uses the power that places a decimal just after the first
significant digit. Put simply, for a positive power +y, the enotation can be interpreted as
“move
the decimal point y positions to the right.” For a negative power y, the inter- pretation is
“move the decimal point y positions to the left” This is exactly how R presents e-
notation:
R> 2342151012900
[1]2.342151e+12
R> 0,0000002533
[1]2.533e-07
21 Assigning Objects
specify an assignment in R in two ways: using arrow notation (<-) and using a single
equal sign (=). Both methods are shown here:
Rox <5
R>x [1] -5
R>x =x +1 # this overwrites the previous value of x R> x
|_|
0-4
R will display the value assigned to an object when you enter the name of the object into
‘the console. When you use the object in subsequent operations, R will substitute the
value you assigned to it
Objects can be named almost anything as long as the name begins with a letter (in
other words, not a numbex), avoids symbols (though underscores and periods are fine),
and avoids the handful of "reserved" words such as those used for defining special
values
2.2 Vectors
R vectors are the sameas the arrays in C language which are used to hold
multiple data values of the same type. One major key point is that in R the
indexing of the vector will start from ‘1’ and not from ‘0'. We can create numeric
vectors end character vectors as well
Vectors in R
More complicated data structures may consist of several vectors, The function for
creating a vector is the single letter c, with the desired entries in parentheses separated
by commas.
X<- 0(61, 4, 21, 67, 89, 2)
cai(using ¢ function’, X, \n’)
‘Sequences, Repetition, Sorting, and Lengths
Sequences with seq
use the seq command, which allows for more flexible creations of sequences. This
ready-to-use function takes in a from value, a to value, and a by value, end it returns the
cotresponding sequence as a numeric vector.
R> seq(from=3,to=27 by=3)
[1]36 912151821 2427
Sequences will always start at the from number but will not always include the to
number, depending on what you are asking R to increase (or decrease) them by.
|_|
Instead of providing a by value, how ever, you can specify a length.out value to produce
a vector with that many numbers, evenly spaced between the from and to values.
seq(from=2,to=10,length.out=10)
[1] 2.000000 2.888889 3.777778 4.666667 5.555556 6.444444 7.333333
8.222222 9.111111
[10] 10.000000
Repetition with rep
R> rep(x=1 times=4)
(1111
R> rep(x=c(8,62,8.3),times=3)
[113.0 6208330 620833062083
R> rep(x=c(3,62,8.3),each=2)
[1]3.0 3.0 62.062.08.383
R> rep(x=c(3,62,8 3),times=3,each=2)
[1]3.0 3.0 62.0 62.083 8.3.3.0 3.0 62.062.08.38.33.03.0 620
[16] 62.08.38.3
The rep function is given a single value ora vector of values as its argument x, as well
asa value for the arguments times and each. The value for times provides the number
of times to repeat x, and each provides the number of times to repeat each element of x
Sorting with sort
Sorting a vector in increasing or decreasing order of its elements is another simple
operation.
R> sort(x=0(2.5-1,-10,3.44) deoreasing=FALSE)
[1]-10.00 1.002.503.44
R> sort(x=c(2.5-1,-10,3.44) decreasing=TRUE)
[1]3.442.50 1.00 -10.00
decreasing indicates the order in which you want to sort. A logical value can be only one
of two specific, case-sensitive values: TRUE or FALSE. set decreasing=FALSE to sort
from smallest to largest, and decreasing=TRUE sorts from largest to smallest.
Finding a Vector Length with length
Determines how many entties exist in a vector given as the argument x.|
R>length(x=5:13)
fo
Subsetting and Element Extraction
Immediately to the left of the output there is a square-bracketed [1]. When the output is a
long vector that spans the width of the console and wraps onto the following line,
another square: bracketed number appears to the left of the new line. These numbers
rep- resent the index of the entry directly to the right.
These indexes allow you to retrieve specific elements from a vector, which is known as
subsetting.
A< (1,4,56,7,3,10)
Als]
fs
Another subsetting tool is the colon operator, which creates a sequence of indexes
Re ax-0(1,4,5,6,7,8,3,9,10)
R>b<-al3:6]
Rob
f]s678
R>A<-c(56,7.8,3,19)
Re indexes <- o(4,rep(x=2,1imes=3),1,1,2,3:1)
R> indexes
1]4222112321
Re Alindexes]
[113.0 6.0 6.06.0 5.05.0 6.0 7.86050
It can create an entirely new vector of any length consisting of some or all of the
elements in the original vector. As shown earlier, this index vector can contain the
desired element positions in any order and can repeat indexes.
Vector-Oriented Behavior
Vectors are so useful because they allow R to carry out operations on multiple elements
simultaneously with speed and efficiency. This vector- oriented, vectorized, or element-
wise behavior is a key feature of the language, one that you will briefly examine here
through some examples of rescaling measurements
|_|
R>S1 < 5.50.5
R>S1
[1]5.54.53.5251505
R>S1-c(2,4,68,10,12)
1] 3505-25 55 -85 -115
‘This code creates a sequence of six values between 5.5 and 0.5, in incre- ments of 1.
From this vector you subtract another vector containing 2, 4.6, 8, 10, and 12.
Another benefit of vector-oriented behavior is that you can use vectorized functions to
complete potentially laborious tasks. For example, if you want to sum or multiply all the
entries in a numeric vector, you can just use a builtin function
R>SI
0]554535251505
We can find the sum of these six elements with
R> sum(foo)
fi]is
and their product with
R> prod(foo)
[1] 162.4219
vectorized functions are faster and more efficient than an explicitly coded iterative
approach like a loop
Lastly, this vector-oriented behavior applies in the same way to overwriting multiple
elements. Again using $1, examine the following
R>fooS1
01]554535251505
R> S1[c(1,3,5,6)] < (99,99)
R>S1
[1]-99.04.5 99.02.5-99.0 99.0
MATRICES AND ARRAYS
A matrix is simply several vectors stored together. Arrays are used for the allocation of
space at contiguous memory locations.An array in R can be created with the use of
array() function. List of elements is passed to the array() functions along with the
dimensions as required
|_|
‘syntax’
array(data, dim = (nrow, ncol, nmat), dimnames=names)
nrow : Number of rows
col : Number of columns
mat : Number of matrices of dimensions nrow * ncol
dimnames : Default value = NULL
Example
To create a matrix in R, use the aptly named matrix command, providing the entries of
the matrix to the data argument as a vector:
Rod <- matrix(data=c(3,7,6,8),nrow=2,ncol=2)
RoA
Filling Direction
Matrix has been filled in a column-by-calumn fashion when reading the data entries
from left to right. We can control how R fills in data using the argument byrow, as
shown in the following examples:
R> matrix (data=0(1.2.3.4,5,6),nrow=2,ncol=
ti) C2) 03)
iss
(21246
byrow=FALSE)
R to provide a 2X3 matrix containing the digits 1 through 6. By using the optionel
argument byrow and setting it to FALSE, it fill this 2X3 structure in @ column-wise
fashion, by filling each column before moving to the next, reading the data argument
vector from left to right.
Now, below code is for byrow=TRUE
R> matrix(data=c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=TRUE)
(a) C2) 03)
(h23
[2]456
Row and Column Bindings
|_|
If multiple vectors have same length, build @ matrix by binding together these vectors
using the built-in R functions, rbind and cbind
R> rind (1:3,4:6)
(11 (2) 03)
23
[2,]456
tbind has bound together the vectors as two rows of a matrix, with the top-to-bottom
order of the rows matching the order of the vectors sup- plied to rbind, The same matrix
could be constructed as follows, using cbind:
Re cbind(c(1,4),c(2,5) 8,6)
(1) £2] £3)
f23
[2,456
Here, three vectors each of length 2 ay. You use cbind to glue together these three
vectors in the order they were supplied, and each vector becomes a column of the
resulting matrix.
3.1.3Matrix Dimensions
Another useful function, dim, provides the dimensions of a matrix stored in your
workspace.
R> A <-rbind(e(1,3,4),5:3,c(100,20,90),11:13)
RA
(1)(21 £3]
fiis4
[2)543
[3,]1002090
[4111213
R> dim(A)
43
R>nrow(A)
o4
R>ncol(A)
f3
R> dim(A)[2]
ts
Subsetting
|Extracting and subsetting elements from matrices in R is much like extracting elements
from vectors. Element extraction still uses the square-bracket operator, but now it must
be performed with both a row and a column posé- tion, given strictly in the order of
Frow,column]
R> A <-matrix(o(1,2.3,6,3,7,8,9) nrow=8,ncol=3)
RA|
C1) (2) [3]
hj] 139
f] 271
B] 6 8 2
For third row and second column of A,
R>Al3.2)
te
Row, Column, and Diagonal Extractions
To extract an entire row or column from a matrix, specify the desired row or column
number and leave the other value blank. It is important the comma separates the row
and column number. The following returns the second column of A’
R> Al)
011378
R> Ali]
f139
Consider the following subsets:
R>Al2:3,]
(1) (2) 3)
fli 3 9
Bl 2 704
The first command returns the second and third rows of A.
We can identify the values along the diagonal of a square matrix (that is, a matrix with
an equal number of rows and columns) using the diag command.
R>diag(x=A)
172
Omitting and Overwriting
To delete or omit elements from a matrix, you again use square brackets, but this time
with negative indexes. The following provides A without its sec- ond column
R> Al-2)
|Oi) (2)
f)1 9
2] 201
Ble 2
‘The following removes the first row from A and retrieves the third and second column
values, in that order, from the remaining two rows:
Re AL1,3:2]
(il (2
ft 7
f2]2 8
The following produces A without its first row and second column
R> AL -2}
(I L2]
f)21
[ale 2
To overwrite particular elements, or entire rows or columns, you identify the elements to
be replaced and then assign the new values
RBA
RB
C1) (2) [3]
fi] 13 9
2) 271
B] 6 8 2
Matrix Operations and Algebra
For any m x n matrix A, its transpose, AT, is the nm matrix obtained by writing either its
columns as rows or its rows as columns.
A= 252 then A" =26
617 5
27
In R, the transpose of a matrix is found with the function t. Let's create a new matrix and
then transposeit.
|R>A1(H(a))
fi] [2] [3]
fij2s2
[2]o14
Identity Matrix
The identity matrw ix ritten as Im is a particular kind of matrix used in mathe- matics. Its
a squarem x m matrix with ones on the diagonal and zeros elsewhere.
Here's an example:
13-10
1
con
0
0
01
In R, diag function is used to find the | dentity Metrix.
R>A<-diag(x=3)
RA
C1) C2] (3)
{00
[2,Jo10
[3,]o01
3.2.3Scalar Multiple of a Matrix
A scalar value is just a single, univariate value. Multiplication of any matrix Aby a
scalar value a results in a matrix in which every individual element is multiplied by a.
Here's an example:
2x23 = 4 6
|Scalar multiplication of a matrix. is carried out using the stan- dard arithmetic * operator.
R> A < rhind(c(2,5,2),0(6,1,4))
Roac2
oat
(1) 02) G3]
[14104
lay1228
Matrix Addition and Subtraction
Addition or subtraction of two matrices of equal size is also performed in an element-
wise fashion. Corresponding elements are added or subtracted from one another,
depending on the operation
‘Add or subtract is possible for any two equally sized matrices with the standard + and -
symbols.
RA
R> Ac- chind(e(2,5,2),c(6,1,4))
fi) [2]
{il26
[2)51
B24
R> B<- chind(c(2,3,6),0(8.1,8.2/9.8))
R>B
(1) [2]
fz 81
Js a2
Ble -98
ReAB
[2]
04-24
[2]2-7.2
1413.8
|_|
Matrix Multiplication
In order to multiply two matrices A and B of sizemn and pq, it must be true that n = p.
The resulting matrix Ax B will have the size m xq.
R> A<-rbind(c(2.5,2).c(6,1.4))
R> dim(A) [1]23
R>B < cbind(c(3-1,1),c(-3,1,5))
R> dim(B) [1] 3 2
This confirms the two matrices are compatible for multiplication, so you can proceed.
Re A%%B
tO) (2)
[1.9
[2)z13
Matrix Inversion
Some square matrices can be inverted. The inverse of a matrix Ais denoted
A-1. An invertible matrix satisfies the following equation’
AAMT = Im
the R function solve as one option for invert- ing a matrix.
R> A < matrix(data=0(3.4,1,2),nrow=2,ncol=2)
RA
Ol [2]
081
(2)42
R> solve(A)
(i) [2]
fh os
(2}2 1.5
NON-NUMERIC VALUES
statistical programming also requires non-numeric values.
Logical Values a logical-valued object can only be either TRUE or FALSE. These can be
inter-
preted as yes/no, one/zero, satisfied/not satisfied, and so on
|
L|
A Logical Outcome: Relational Operators
Logicals are commonly used to check relationships between values.
Operatorinterpretation
‘qual to
=Not equal to
>Greater than
=Greater than or equal to
ess than or equal to
RoI
[I] FALSE
Ro 132
[I] FALSE
Multiple Comparisons: Logical Operators
logical operators, which are used to compare two TRUE or FALSE objects. These
operators are based on the statements AND and OR.
R> FALSE||((T&&TRUE)||FALSE)
[1] TRUE
R> ITRUE&&TRUE
[I] FALSE
Logicals Are Numbers!
In R, TRUE is treated like 1, and FALSE is treated like 0.
R> TRUE+TRUE
2
R> FALSE-TRUE [1] -1
Ro 1881
[1] TRUE,
R> 1110
[1] TRUE
Character
Character strings are another common data type, and are used to repre- sent text. In R,|
strings are often used to specify folder locations
There are three different string formats in the R environment. The default string format
is called an extended regular expression; the other variants are named Perl and literal
regular expressions. The intricacies of these variants are beyond the scope of this book,
so any mention of character strings from here on refers to an extended regular
expression. For more technical details about other string formats, enter ?regex at the
prompt.
R> str" Thisis string”
R>nehar(x=str)
fy) 14
Almost any combination of characters, including numbers, can be a valid character
string.
R> bar <-"23.3"
R> bar
f]"23.3"
Strings can be compared in several ways, the most common comparison being a check
for equality.
R>"alpha"=="alpha"
[1] TRUE
R>"alpha'l="beta’
i] TRUE
R> c(‘alphe"’beta’’gamma')
[1] FALSE TRUE FALSE
Concatenation
R> qux <-c{"awesome'
R= length(x=qux)
3
R>qux
[I] "awesome" 'Ris”
When calling cat or paste, you pass arguments to the function in the order you want
them combined. The following lines show identical usage yet different types of output
from the two functions:
||
R> cat(qux[2].qux(3]'totally". qux{1]""’)
Ris totally awesome!
R> paste(qux(2),qul3};totally’qux{1];
[1] 'Ris totally awesome !"
In the output, note that cat has simply concatenated and printed the text to the screen.
This means you cannot directly assign the result to @ new variable and treat it as a
character string. For paste, however, the [1] to the left of the output and the presence of
the " quotes indicate the returned item is a vector containing a character string, and this
can be assigned to an object and used in other functions.
‘These two functions have an optional argument, sep, that's used as a separator
between strings as they're concatenated. You pass sep a character string, and it will
place this string between alll other strings you've provided to paste or cat. For example
R> paste(qux{2],qux{3},"totally'qux{ 11" sey
[1] "R-is—totally—awesome—"
R> paste(qux(2],qux(3},totally’,quxl1
[1] "Ristotallyawesome
‘The empty string separator can be used to achieve correct sentence spac- ing; note the
gap between awesome and the exclamation mark in the previous code when you first
used paste and cat. If the sep argument isn't included, R will insert a space between
strings by default,
Substrings and Matching
Pattern matching lets you inspect a given string to identify smaller strings within it
The function substr takes a string x and extracts the part of the string between two
character positions (inclusive), indicated with numbers passed as start and stop
arguments. Lets try it on the object foo from Section 4.2.1
R> STR < 'Thisis a character string!”
R> substr(x=foo,start=21,stop=27)
[1] "string!"
‘The function substr can also be used with the assignment operator to direcily substitute
in a new set of characters
Re substi(x=foo,start=1,stop=4) <- "Here"
|_|
R> foo
[1] "Here is a character string!”
Substitution is more flexible using the functions sub and gsub. The sub function
searches a given string x for a smaller string pattem contained within. It then replaces
the first instance with a new string, given as the argu: ment replacement. The gsub
function does the same thing, but it replaces every instance of pattern. Herés an
example:
R> bar <:"How much wood could a woodchuck chuck"
R> sub(pattem="chuck' replacement="hur!’,x=bar)
[1] "How much wood could a woodhurl chuck"
R> gsub(pattern="chuck’ seplacement="hurl"x=bar)
[1]"How much wood could a woodhurl hurl”
LISTS AND DATA FRAMES
Lists of Objects
The list is an incredibly useful data structure. It can be used to group together any mix
of R structures and objects. A single list could contain a numeric matrix, a logical array,
a single character string, and a factor object,
5.1.1Definition and Component Access
Creating a list is much like creating a vector. You supply the elements that you want to
include to the list function, separated by commas
R> foo <- list(matrix(data=1:4,arow=2,ncol=2),c(1,F,1,7)hello")
Re foo [fl]
(2)
(us
(224
{2]
[1] TRUE FALSE TRUE TRUE
fs]
|_|
[1] "hello"
length function to check the number of components in a list.
R> length(x=foo)
ts
You can retrieve components from a list using indexes, which are entered in double
square brackets
R>foolf1l]
(J [2]
(hs
[2)24
R>foo[[3]]
[1] "hello"
This action is known as a member reference. When you've retrieved a component this
way, you can treat it just like a stand-alone object in the workspace; there's nothing
special that needs to be done
Re fool] + 5.5
(2)
11]6.58.5
(217.59.5
Re foo[f1]][1,2]
ts
Re foolf1I2,] 1] 2.4
R> cat(foo[[3]}"you!") hello you!
Nesting
‘Add components to any existing list by using the dol- lar operator anda new
name. Here's an example using foo and baz from earlier:
R> bazSbobby <- foo
R> baz
Stom
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
$dick
[1] "gday mate"
Sharry
(C2)
te
|(2148
Sbobby
SbobbySmymatrix
(2)
(us
(224
Data Frames
A data frame is R's most natural way of presenting a data set with a collection of
recorded observations for one or more variables.
data frames have no resttiction on the data types of the variables; you can store
numeric data, factor data, and so on. The R data frame can be thought of as a list with
some extra rules attachad. The data frame is one of the most important and frequently
used tools in R for statistical data analysis.
Construction
To create a data frame from scratch, use the data.frame function. You supply your data,
grouped by variable, as vectors of the same length—the same way you would construct
a named list. Consider the following example data set:
R> mydata <- data.frame(person=
age=c(42,40,17,14,1), sex=factor(o("M'"F",F',"M"'M")))
R> mydata
person age sex
1 Peter42M
2Lois40F
3Meg17F
4 Chris14M
5 StewieiM
R> mydata[3:5,3]
(1]F MM Levels: F M
‘This returns a factor vector with the sex of Meg, Chris, and Stewie
R> mydatalc(3,1)]
sex person
1M Peter
2FLois
3FMeg
||
4M Chris
5M Stewie
This results in another data frame giving the sex and then the name of each person.
You can subset this returned vector, too:
Rs mydataSagel2]
fi]40
You can report the size of a data frame—the number of records and variables
R= nrow(mydata)
fs
R>ncol(mydata)
ts
R> dim(mydata) [1] 53
Adding Data Columns and Combining Data Frames
Say you want to add data to an existing data frame. This could be a set of observations
for a new variable (adding to the number of columns), or it could be more records
(adding to the number of rows). Once again, you can use some of the functions you've
already seen applied to matrices
R> newrecord <- data.frame(person="Brian’,age=7,
sex=factor("M"Jevels=levels(mydate$sex)))
R> newrecord person age sex
1 Brian7M
SPECIAL VALUES, CLASSES, AND COERCION
Infinity
When a number is too large for R to represent, the value is deemed to be infinite.
R> foo < Inf
R> foo
[i] Inf
R> bar <- 0(3401 Inf 3.1,-555,Inf,43)
R> bar
[1] 3401 0Inf3.1 -585.01nf43.0 R> baz <-90000"100
R> baz
(Inf
Re qux <- o(-42,565,Inf nf Inf-45632.3)
Re qux
|[1]42.0565.0-nf-infinf -45632.3
R>-Inf InfInf
[I] FALSE
NaN
In some situations, it's impossible to express the result of a calculation using a number,
Inf, or ‘nf. These difficult-to-quantify special values are labeled NaN in R, which stands
for Not a Number.
R> foo <- NaN
Re foo
[1] Nan
R> bar <-o(NaN,54.3-2,NaN.90094.123,-Inf,55)
R> bar
[1]NaN54.30-2.00NaN 90094.121nf55.00
Re Inf+inf
[1] Nan
R> Inf/inf
[1] Nan
NA
R provides a standard special term to represent missing values, NA, which reads as Not
Available
NA entries are not the same as NaN entries. Whereas NaN is used only with respect to
numeric operations, missing values can occur for any type of observation. As such, NAS
can exist in both numeric and non-numeric set: tings. Heres an example:
R> foo <-¢("character’'a"NA'with’ string’,NA)
R> foo
[1]'character" "a"NA'with"string'NA
R> bar < factor(c("blue” NANA"blue’ 'green',blue’,NA\"red" red’\NA, ‘green’))
R> bar
[blue blue green blue redred green Levels: blue green red
NULL
Finally, you'll look at the null value, written as NULL. This valueis often used to explicitly
define an “empty” entity.
R>c(2,4,NA8)
[1]24NA8
R>c(2,4,NULL8)
(1]248
||
The first line creates a vector of length 4, with the third position coded as NA. The
second line creates a similar vector but using NULL instead of NA. The result is a vector
with a length of only 3. That's because NULL cannot take up a position in the vector. As
such, it makes no sense to assign NULL to multi- ple positions in a vector (or any other
structure)
R> c(NA.NA.NA)
[1] NANA NA
R> o(NULL,NULL,NULL)
NULL
Understanding Types, Classes, and coercion
Attributes
Each R object you create has additional information about the nature of the object itself.
This additional information is referred to as the object's attributes.
R> foo <- matrix(data=1:9nrow=3,ncol=3)
R> foo
(1) £2] (3)
47
[2258
[3,]369
Re attributes(foo)
$dim [1]3 3
Object Class
‘An object's class is one of the most useful attributes for describing an entity in R. Every
object you create is identified, either implicitly or explicitly, with at least one class. Ris
an object-oriented programming language, meaning entities are stored as objects and
have methods that act upon them. In such a language, class identification is formally
referred to as inheritance,
R>num.vect <- 1:4
R>num.vect [1] 1234
R>num.vee2 <- seq(from=1,to=4,length=6)
R>num.vec2
Rs class(num.vect)
[1] "integer"
||
Multiple Classes
Certain objects will have multiple classes. A variant on a standard form of an object,
such as an ordered factor vector, will inherit the usual factor class and also contain the
additional ordered class. Both are returned if you use the class function.
R> ordfac.vee < factor(x=c(‘Small’’Large’"Large’/Regular’,’Small’),
levels=c('Small"’Reguler""Large’), ordered=TRUE)
R>ordfac.vec
[1] SmallLargeLargeRegular Small Levels: Small < Regular < Large
R> class(ordfac.vec)
[1] "ordered" "factor’
As-Dot Coercion Functions
Converting from one object or data type to another is referred to as coercion. coercion is,
performed either implicitly or explicitly.
R>as.numeric(c(T,.FFT))
fi]1001
R> 1:4+as.numeric(C(LFFT))
02235
R>as.logical(o("1"'0"'1',
[1] NA NA NANA NA
0")
BASIC PLOTTING
The R function plot, on the other hand, takes in two vectors—one vector of x locations
and one vector of y locations—and opens a graphics device where it displays the result.
Ia
graphics device is already open, R's default behavior is to refresh the device, overwriting
the current contents with the new plot.
R> foo <-c(1.1,2,3.5,3.9,4.2)
Re bar <-o(2.2.21.3,0,0.2)
R= plot(foobar)Graphical Parameters
type Tells R how to plot the supplied coordinates (for example,
as stand-alone points or joined by lines or both dots and lines)
main, xlab, ylab Options to include plot title, the horizontal axis
label, and the vertical axis label, respectively.
Co! Color (oF colors) to use for plotting points and lines
ch Stands for point character. This selects which character to use
for plotting individual points.
cex Stands for character expansion. This controls the size of plotted
point characters.
lty Stands for /ine type. This specifies the type of line to use to
connect the points (for example, solid, dotted, or dashed)
Iwd_ Stands for line width. This controls the thickness of plotted lines.
xlim, ylim This provides limits for the horizontal range and vertical
range (respectively) of the plotting region