Crash Course
Crash Course
Crash Course
Contents
1 Getting help 2 Entering data 2.1 By hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Regular sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Spreadsheet data 3.1 Delimited les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Clipboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Using data frames 4.1 Sorting data frames 2 2 2 2 3 3 3 3 3 4 4 5 6 8 8 9 10
. . . . . . . . . . . . . . . . . . . . . . . . .
5 Interpolation 5.1 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 6 A graph with two y-axes 7 Nonlinear curve tting 8 Multiple graphs 8.1 Subplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Using colors 10 Mixing of watermasses - Linear algebra
Getting help
To get help about a function use ?function e.g ?plot gets you all the information about scatter plots. If you dont know the name of the function you want to use try help.search("useful phrase")
Entering data
There are several ways to get your data into R, including cut-and-paste from the clipboard. Importing large data sets is quite easy and entering small data sets by hand is not to dicult either.
2.1
By hand
To create a vector use the command c . > n = c(1, 5, 7) > n [1] 1 5 7 Or the command scan .
2.2
Regular sequences
> y = seq(0, 1, length = 11) > y [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 > F = rep("A", 2) > F [1] "A" "A" Combining them: > G = rep(c("A", "B"), 3) > G [1] "A" "B" "A" "B" "A" "B" 2
3
3.1
Spreadsheet data
Delimited les
To import data from spreadsheets (e.g MS Excel) rst save your le as comma separated values (for Excel: File->Save as->CSV). To read a CSV le (data.csv) into an R data-frame use the following: > data = read.csv("data.csv") > data A V Q N 1 Pelle 5 1.50 2 Kalle 6 0.90 3 Nisse 7 0.70 4 Eva 12 0.90 5 Gunnar 18 1.08
1 2 3 4 5
3.2
Clipboard
For smaller datasets it is also possible to read data from the clipboard, using read.delim(file="clipboard")
In section 3 you imported data in a data frame. To create a subset of this data frame you can select according to this scheme: subset = data[rows,column]. The expression for the rows can use both row numbers i.e. 1:10 or selection criteria.
4.1
order
To sort the following (generated) data on e.g. rst stn, then pos and then time, > data <- expand.grid(stn = c(140, 300), pos = c(1, 2), time = c(2, + 5), depth = c("shallow", "deep")) > i <- order(data$stn, data$pos, data$time) > data[i, ] stn pos time depth 140 1 2 shallow 140 1 2 deep 140 1 5 shallow 140 1 5 deep 3
1 9 5 13
3 11 7 15 2 10 6 14 4 12 8 16
140 140 140 140 300 300 300 300 300 300 300 300
2 2 2 2 1 1 1 1 2 2 2 2
2 2 5 5 2 2 5 5 2 2 5 5
shallow deep shallow deep shallow deep shallow deep shallow deep shallow deep
Interpolation
Interpolation in contrast to curve tting has only the purpose to derive data points that werent measured and not to t parameters of interest. Curve tting typically ts one or a few curves to the whole data domain, while interpolation ts lines or curves between every data pair. Smoothing functions can also be applied and then functions are based on more than two data points.
5.1
Linear Interpolation
approx The example is sediment trap data measured at eight times during a year and interpolated to daily measurements. > > > > > > > > > + x = seq(1, 365, length = 8) y = c(10, 15, 60, 40, 50, 30, 15, 10) xout = seq(1, 365, 1) yout = approx(x, y, xout) plot(y ~ x, pch = 16, cex = 1.5) lines(yout) youtc = approx(x, y, xout, method = "constant") lines(youtc, lty = 2) legend(max(x), max(y), lty = 1:2, legend = c("linear", "constant"), xjust = 1)
60
linear constant 50
q
40
y 30
20
q q
10
100 x
200
300
To generate a gure with two y-axes, related to two dierent quantities, for instance, the concentration and isotopic composition of a chemical species. > > > > > > > > par(mar = c(5.1, 4.1, 4.1, 4.1)) plot(y1 ~ x, bty = "u", ylab = "y=0+1*x") par(new = T) plot(y2 ~ x, bty = "u", ylab = "", yaxt = "n", pch = 16) axis(4) mtext("y=1/x", side = 4, line = 3) legend(5, 0.8, c("y=x", "y=1/x"), pch = c(1, 16)) title(main = "A plot with two y-axes")
q q q
y=x y=1/x
q
y=0+1*x
0.6
q q q q
q q q q q q q q
4 x
7
nls
1.0
10
(1)
And this is the data: > a = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) > l = c(10, 21, 50, 70, 83, 90, 92, 95, 99, 100, 105) > plot(l ~ a)
100
q q q q q
80
60
20
40
0.0
0.2
0.4 a
0.6
0.8
1.0
> length.nl = nls(l ~ Linf - (Linf - L0) * exp(-r * a), start = list(Linf = 100, + L0 = 10, r = 5)) > summary(length.nl) Formula: l ~ Linf - (Linf - L0) * exp(-r * a) Parameters: Estimate Std. Error t value Linf 109.4566 5.0720 21.581 L0 4.1270 4.5993 0.897 r 3.0251 0.4504 6.717 --Signif. codes: 0 '***' 0.001 '**'
Pr(>|t|) 2.24e-08 *** 0.39575 0.00015 *** 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.127 on 8 degrees of freedom Correlation of Parameter Estimates: Linf L0 L0 0.3393 r -0.8872 -0.5621 > plot(l ~ a, pch = 16, cex = 1.5) > aa = seq(min(a), max(a), length = 100) 7
100
q q q q q q
80
q q
60
40
20
q q
0.0
0.2
0.4 a
0.6
0.8
1.0
Multiple graphs
Graphs are often grouped together in a window or gure le, and there are two ways to do this.
8.1
Subplots
Use this when you need to group graphs that are dierent in a single le. If the graph is just conditioned on a factor use the method below. > par(mfrow = c(1, 2)) > plot(nh4, z, type = "b") > boxplot(heffa ~ f)
q q q q q
q q q q q
10
q q q q q
15
q q q q q
20
10
12
200 nh4
400
600
Using colors
If you make a graph intended for a poster or presentation, using colors can be very helpful and also makes your graphs look more interesting. You can use colors by their name e.g. green or by hexadecimal notation. The latter is best generated by specialized functions such as rainbow. > > > > > par(mfrow = c(2, 2)) barplot(rep(c(1, 2), 5), col = rainbow(2)) barplot(rnorm(5), col = heat.colors(5)) barplot(1:10, col = rainbow(10)) pie(1:10, col = terrain.colors(10))
2.0
1.5
0.5
1.0
10
0.0
0.3
0.1
0.0
6 7
4 3 2 1 10
8 2 9 0
10
(3)
> (A <- rbind(c(20, 30, 25), c(4, 4, 5), c(1, 1, 1))) [1,] [2,] [3,] [,1] [,2] [,3] 20 30 25 4 4 5 1 1 1
> (x <- solve(A, b)) [1] 0.7 0.1 0.2 which means that contributions of 70% from the rst, 10% and 20% from the third watermass. 10
Index
approx, 4 c, 2 nls, 6 order, 3 scan, 2
11