0

I would like to sum values of one column based on another column(s) value as efficiently as possible. I was not sure if there was a way to use the summarize command. Here is an example data set:

Cancer1   Cancer2   Cancer3   Disease1
1         0         1         1
0         1         0         0
1         0         0         1 

In this case I am looking to sum Disease1 based on if the person has a given cancer. I am looking for an output that would say the total number of people that have Cancer1 and Disease1 is 2, the total number of people that have Cancer2 and Disease1 is 0 and the total number of people that have Cancer3 and Disease1 is 1.

3 Answers 3

1

We can create the variable using rowSums on the 'Cancer' columns and then multiply with the binary 'Disease' column

df1$newCol <- (rowSums(df1[1:3] > 0)) * df1$Disease1
df1$newCol
#[1] 2 0 1
0

You may want to have a look at dplyr::count().

# sum up the number of people that have Cancer1 and Disease1:
foo <- ds %>% count(Cancer1 , Disease1)

# extract the integer result you are looking for:
foo %>% filter(Cancer1 == 1, Disease1== 1) %>% pull(n)
0

Rather than going right away to a code-answer, I'd like to offer some (unsolicited) advise regarding the formatting of the data:

It seems to me that you could profit a lot from having a long table, instead of the wide one you have (You may have many more cancer types, such as "cancer_n"; and many more diseases, like "disease_n"). For having a long table, you may find it necessary to define some sort of id for each record. Also, for completeness of the results, I'd like to offer a data.table solution:

require(data.table) # loads the package

a <- data.table(id = 1:3, 
                Cancer1 = c(1,0,1), 
                Cancer2 = c(0,1,0), 
                Cancer3 = c(1, 0,0), 
                Disease1 = c(1,0,1)) # create a data.table with an additional id

# melt the data.table (make it long-form), and calculate the expected result:
melt(a, c("Disease1", "id"))[Disease1 == 1 & value == 1, .N, by = variable]

   variable N
1:  Cancer1 2
2:  Cancer3 1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.