Sum a variable based on another variable

Question

I have a dataset consisting of two variables, Contents and Time like so:

Time          Contents
2017M01       123
2017M02       456
2017M03       789
.             .
.             .
.             .
2018M12       789

Now I want to create a numeric vector that aggregates Contents for six months, that is I want to sum 2017M01 to 2017M06 to one number, 2017M07 to 2017M12 to another number and so on.

I'm able to do this by indexing but I want to be able to write: "From 2017M01 to 2017M06 sum contents corresponding to that sequence" in my code.

I would really appreciate some help!

Just create a grouping variable for every 6 rows. Something like rep(seq(nrow(df)%/%6), each = 6) — Sotos, Commented Feb 27, 2019 at 10:35
@Sotos But I want to be able to specifically write something like "2017M01":"2017M06". — Emil Nyboe Blicher, Commented Feb 27, 2019 at 10:41
As @Sotos suggested, you should create another grouping variable and then use group_by and summarise from dplyr package — Sonny, Commented Feb 27, 2019 at 10:46

Sotos · Accepted Answer · 2019-02-27 13:44:57Z

You can create a grouping variable based on the number of rows and number of elements to group. For your case, you want to group every 6 rows so your data frame should be divisible with 6. Using iris to demonstrate (It has 150 rows, so 150 / 6 = 25)

rep(seq(nrow(iris)%/%6), each = 6)
  #[1]  1  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  4  4  4  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8  8  8  9  9  9  9  9  9 10 10 10 10
 #[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25

There are plenty of ways to handle how you want to call it. Here is a custom function that allows you to do that (i.e. create the grouping variable),

f1 <- function(x, df) {
    v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
    v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
    i1 <- (v2 - v1) + 1
    return(rep(seq(nrow(df)%/%i1), each = i1))
}

f1("2017M01:2017M06", iris)
  #[1]  1  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  4  4  4  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8  8  8  9  9  9  9  9  9 10 10 10 10
 #[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25

EDIT: We can easily make the function compatible with 'non-0-remainder' divisions by concatenating the final result with a repetition of the max+1 value of the final result of remainder times, i.e.

f1 <- function(x, df) {
    v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
    v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
    i1 <- (v2 - v1) + 1
    final_v <- rep(seq(nrow(df) %/% i1), each = i1)
    if (nrow(df) %% i1 == 0) {
        return(final_v)
    } else {
        remainder = nrow(df) %% i1
        final_v1 <- c(final_v, rep((max(final_v) + 1), remainder))
        return(final_v1)
    }
}

So for a data frame with 20 rows, doing groups of 6, the above function will yield the result:

f1("2017M01:2017M06", df)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4

Thank you for your answer! I have made a reply to your first comment. I think this is the next best option if it's not possible to do it the way I want. — Emil Nyboe Blicher, Commented Feb 27, 2019 at 10:49
Yes, I am trying to make it work the way you want. Give me a sec — Sotos, Commented Feb 27, 2019 at 10:50
Ok have a look now. This should give you ideas on how to fit it to your needs. — Sotos, Commented Feb 27, 2019 at 10:57
Yes, it did. Thanks a lot for your help - I have ticked the mark. One last question: Can I use a variant of the code if I for example I miss one observation (or there are to many) to make it divisible by 6? — Emil Nyboe Blicher, Commented Feb 27, 2019 at 13:29

Collectives™ on Stack Overflow

Sum a variable based on another variable

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
r
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged r or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
r
or ask your own question.