0

I have a dataset consisting of two variables, Contents and Time like so:

Time          Contents
2017M01       123
2017M02       456
2017M03       789
.             .
.             .
.             .
2018M12       789

Now I want to create a numeric vector that aggregates Contents for six months, that is I want to sum 2017M01 to 2017M06 to one number, 2017M07 to 2017M12 to another number and so on.

I'm able to do this by indexing but I want to be able to write: "From 2017M01 to 2017M06 sum contents corresponding to that sequence" in my code.

I would really appreciate some help!

3
  • Just create a grouping variable for every 6 rows. Something like rep(seq(nrow(df)%/%6), each = 6)
    – Sotos
    Commented Feb 27, 2019 at 10:35
  • @Sotos But I want to be able to specifically write something like "2017M01":"2017M06". Commented Feb 27, 2019 at 10:41
  • As @Sotos suggested, you should create another grouping variable and then use group_by and summarise from dplyr package
    – Sonny
    Commented Feb 27, 2019 at 10:46

1 Answer 1

1

You can create a grouping variable based on the number of rows and number of elements to group. For your case, you want to group every 6 rows so your data frame should be divisible with 6. Using iris to demonstrate (It has 150 rows, so 150 / 6 = 25)

rep(seq(nrow(iris)%/%6), each = 6)
  #[1]  1  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  4  4  4  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8  8  8  9  9  9  9  9  9 10 10 10 10
 #[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25

There are plenty of ways to handle how you want to call it. Here is a custom function that allows you to do that (i.e. create the grouping variable),

f1 <- function(x, df) {
    v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
    v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
    i1 <- (v2 - v1) + 1
    return(rep(seq(nrow(df)%/%i1), each = i1))
}

f1("2017M01:2017M06", iris)
  #[1]  1  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  4  4  4  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8  8  8  9  9  9  9  9  9 10 10 10 10
 #[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25

EDIT: We can easily make the function compatible with 'non-0-remainder' divisions by concatenating the final result with a repetition of the max+1 value of the final result of remainder times, i.e.

f1 <- function(x, df) {
    v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
    v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
    i1 <- (v2 - v1) + 1
    final_v <- rep(seq(nrow(df) %/% i1), each = i1)
    if (nrow(df) %% i1 == 0) {
        return(final_v)
    } else {
        remainder = nrow(df) %% i1
        final_v1 <- c(final_v, rep((max(final_v) + 1), remainder))
        return(final_v1)
    }
}

So for a data frame with 20 rows, doing groups of 6, the above function will yield the result:

f1("2017M01:2017M06", df)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4
7
  • Thank you for your answer! I have made a reply to your first comment. I think this is the next best option if it's not possible to do it the way I want. Commented Feb 27, 2019 at 10:49
  • Yes, I am trying to make it work the way you want. Give me a sec
    – Sotos
    Commented Feb 27, 2019 at 10:50
  • Ok have a look now. This should give you ideas on how to fit it to your needs.
    – Sotos
    Commented Feb 27, 2019 at 10:57
  • 1
    That is awesome! Thank you so much! Commented Feb 27, 2019 at 11:02
  • Yes, it did. Thanks a lot for your help - I have ticked the mark. One last question: Can I use a variant of the code if I for example I miss one observation (or there are to many) to make it divisible by 6? Commented Feb 27, 2019 at 13:29

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.