-
Notifications
You must be signed in to change notification settings - Fork 2k
/
dplyr_intro.Rmd
66 lines (42 loc) · 2.11 KB
/
dplyr_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
layout: page
title: Brief Introduction to `dplyr`
---
## Brief Introduction to `dplyr`
The learning curve for R syntax is slow. One of the more difficult aspects that requires some getting used to is subsetting data tables. The `dplyr` package brings these tasks closer to English and we are therefore going to introduce two simple functions: one is used to subset and the other to select columns.
Take a look at the dataset we read in:
```{r}
filename <- "femaleMiceWeights.csv"
dat <- read.csv(filename)
head(dat) #In R Studio use View(dat)
```
There are two types of diets, which are denoted in the first column. If we want just the weights, we only need the second column. So if we want the weights for mice on the `chow` diet, we subset and filter like this:
```{r,message=FALSE}
library(dplyr)
chow <- filter(dat, Diet=="chow") #keep only the ones with chow diet
head(chow)
```
And now we can select only the column with the values:
```{r}
chowVals <- select(chow,Bodyweight)
head(chowVals)
```
A nice feature of the `dplyr` package is that you can perform consecutive tasks by using what is called a "pipe". In `dplyr` we use `%>%` to denote a pipe. This symbol tells the program to first do one thing and then do something else to the result of the first. Hence, we can perform several data manipulations in one line. For example:
```{r}
chowVals <- filter(dat, Diet=="chow") %>% select(Bodyweight)
```
In the second task, we no longer have to specify the object we are editing since it is whatever comes from the previous call.
Also, note that if `dplyr` receives a `data.frame` it will return a `data.frame`.
```{r}
class(dat)
class(chowVals)
```
For pedagogical reasons, we will often want the final result to be a simple `numeric` vector. To obtain such a vector with `dplyr`, we can apply the `unlist` function which turns `lists`, such as `data.frames`, into `numeric` vectors:
```{r}
chowVals <- filter(dat, Diet=="chow") %>% select(Bodyweight) %>% unlist
class( chowVals )
```
To do this in R without `dplyr` the code is the following:
```{r}
chowVals <- dat[ dat$Diet=="chow", colnames(dat)=="Bodyweight"]
```