Using lag() in dplyr doesnt work as expected

Question

I have the following data frame:

col1<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
col2<-c(1,2,3,44,1,1,2,3,44,44,1,2,44,1,44)
df<-data.frame(col1,col2)

I am trying to group by col1 entries, and find, for each grouping of col1, values of col2 that are equal to 44 and followed immediately by a smaller entry (<44), and FLAG such entries in a new column.

However, this code doesnt seem to work:

df %>% group_by(col1)  %>% mutate(FLAG=(col2==44 & lead(col2,1)<44))

    col1  col2  FLAG
   <dbl> <dbl> <lgl>
1      1     1 FALSE
2      1     2 FALSE
3      1     3 FALSE
4      1    44  TRUE
5      1     1 FALSE
6      2     1 FALSE
7      2     2 FALSE
8      2     3 FALSE
9      2    44 FALSE
10     2    44  TRUE
11     3     1 FALSE
12     3     2 FALSE
13     3    44  TRUE
14     3     1 FALSE
15     3    44    NA

Specifically, entry 10 should be FALSE, since it has no entry <44 in the same grouping directly following it. Any suggestions on how to write code that works more generally to do what I want?

I get NA in row 10 when I run your code (which is the expected behavior). — eipi10, Commented Feb 8, 2017 at 18:31
I don't know why we're getting different results. I'm also wondering why you're getting NA in row 15 but not in row 10. What happens when you run your code in a clean session with just dplyr loaded? — eipi10, Commented Feb 8, 2017 at 18:47
Other packages have lead and lag functions that behave differently than dplyr versions. My guess is you had masked the dplyr versions with those from another package. — Gregor Thomas, Commented Feb 8, 2017 at 19:24

user2100721 · Accepted Answer · 2017-02-08 18:35:45Z

1

Another way by using if_else function of dplyr package

df %>% group_by(col1)  %>% mutate(FLAG=if_else(col2==44 & lead(col2,1)<44,TRUE,FALSE,missing = FALSE))
# Source: local data frame [15 x 3]
# Groups: col1 [3]
# 
# col1  col2  FLAG
# <dbl> <dbl> <lgl>
# 1      1     1 FALSE
# 2      1     2 FALSE
# 3      1     3 FALSE
# 4      1    44  TRUE
# 5      1     1 FALSE
# 6      2     1 FALSE
# 7      2     2 FALSE
# 8      2     3 FALSE
# 9      2    44 FALSE
# 10     2    44 FALSE
# 11     3     1 FALSE
# 12     3     2 FALSE
# 13     3    44  TRUE
# 14     3     1 FALSE
# 15     3    44 FALSE

answered Feb 8, 2017 at 18:35

user2100721

3,5772 gold badges21 silver badges30 bronze badges

when I run this I still get entry 10 as TRUE. Not sure what is going on.
– user85727
Commented Feb 8, 2017 at 18:43

Add a comment |

erc · Accepted Answer · 2017-02-08 18:28:51Z

1

You can include the condition that lead(col2) may not be NA.

df %>% 
  group_by(col1)  %>% 
  mutate(FLAG = (col2 == 44 & lead(col2, 1) < 44 & !is.na(lead(col2, 1))))

Source: local data frame [15 x 3]
Groups: col1 [3]

    col1  col2  FLAG
   <dbl> <dbl> <lgl>
1      1     1 FALSE
2      1     2 FALSE
3      1     3 FALSE
4      1    44  TRUE
5      1     1 FALSE
6      2     1 FALSE
7      2     2 FALSE
8      2     3 FALSE
9      2    44 FALSE
10     2    44 FALSE
11     3     1 FALSE
12     3     2 FALSE
13     3    44  TRUE
14     3     1 FALSE
15     3    44 FALSE

answered Feb 8, 2017 at 18:28

erc

10.1k11 gold badges60 silver badges92 bronze badges

Not sure why both your solutions seem to work, but not on my computer....
– user85727
Commented Feb 8, 2017 at 18:44
1

@user85727 maybe try updating R/dplyr?
– erc
Commented Feb 8, 2017 at 18:46
I think this may be the issue. Weird, it said I had the latest version installed....
– user85727
Commented Feb 8, 2017 at 18:47
How can I change this code to flag all leading 44s prior to an entry less than 44 so that col2<-c(1,1,1,1,44,44,44,1,44,44,1,2,44,3,44) col1<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3) df<-data.frame(col1,col2) gives me TRUE for entries 5,6, and 7 as well?
– user85727
Commented Feb 8, 2017 at 19:25
@user85727 Sorry, I don't understand how col1 in rows 5 and 6 fulfills the conditions such that "all leading 44s prior to an entry less than 44"
– erc
Commented Feb 8, 2017 at 19:44

| Show 1 more comment

Collectives™ on Stack Overflow

Using lag() in dplyr doesnt work as expected

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
r
dplyr
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged rdplyr or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dplyr
or ask your own question.