Sum based of the name of variables in R

Question

Let's say that I have a df contains ID, gender, and several numerical variables, and MAX1, MAX2, and MAX3, where

MAX1 = the variable name of the first max values of x1,x2,x3,x4,x5

MAX2 = the variable name of the second max values of x1,x2,x3,x4,x5

MAX3 = the variable name of the third max values of x1,x2,x3,x4,x5

### Generate data
set.seed(123)
    ID <- c(1,2,3,4,5,6,7,8,9,10)
gender <- c("m", "m", "m", "f", "f", "m", "m", "f", "f", "m")
x1 <- rnorm(10, 0, 1)
x2 <- rnorm(10, 0, 1)
x3 <- rnorm(10, 0, 1)
x4 <- rnorm(10, 0, 1)
x5 <- rnorm(10, 0, 1)
df <- data.frame(ID, gender, x1, x2, x3, x4, x5)

maxes <- t(sapply(1:nrow(df), function(i) {
    names(sort(df[i,3:7], decreasing=T)[1:3])
}))
colnames(maxes) <- c("MAX1","MAX2", "MAX3")
df <- cbind(df, maxes)

Now I need to create a new column (call ir m_sum) that has the sum values of MAX1 and MAX2.

For example, for ID=1, MAX1 = x2 and MAX2 = x4, then m_sum the shold be equal to 1.2240818 + 0.42646422 = 1.650546.

Is there a reason to sum the two top values of each row by identifying the MAXi row names first and then trying to use the row names? I would rather transpose your data and sum the top 2 values but maybe you have simplified your example quite much so that my approach breaks other requirements... — R Yoda, Commented Apr 7, 2017 at 23:04
Yes, i know it's not the most efficient.... but later i need to use MAX1 and MAX2... — user9292, Commented Apr 7, 2017 at 23:26

Mike H. · Accepted Answer · 2017-04-07 23:26:10Z

3

How about using apply to do it all in one call?

df$m_sum <- apply(df, 1, function(x) as.double(x[x[ "MAX1" ]]) + as.double(x[x[ "MAX2" ]]))
 #[1] 1.65054602 0.15189652 2.45383397 3.04708946 2.02954308 3.50197809 1.39170465 0.09146139 1.48132102
#[10] 1.17044583

edited Apr 7, 2017 at 23:26

answered Apr 7, 2017 at 23:20

Mike H.

14.3k2 gold badges32 silver badges39 bronze badges

Elegant code! Just to make it clear for other readers who want to reuse this answer: The solution loops over all rows of the data.frame (parameter margin is 1) which is quite slow in case of many rows (but performance was no requirement of the OP so absolutely OK here).
– R Yoda
Commented Apr 8, 2017 at 7:55
Thank you. I chose this answer because it was faster than others.
– user9292
Commented Apr 8, 2017 at 18:20

Add a comment |

Henry · Accepted Answer · 2017-04-07 23:16:05Z

This is complicated by df$MAX1 etc. using factors

but a simple loop something like

sumMAX1MAX2 <- numeric()
for (r in 1:nrow(df)){ 
    sumMAX1MAX2[r] <- df[r, as.character(df$MAX1)[r]] + 
                      df[r, as.character(df$MAX2)[r]]
    }

seems to produce

> sumMAX1MAX2
 [1] 1.65054602 0.15189655 2.45383398 3.04708945 2.02954308 3.50197812
 [7] 1.39170470 0.09146141 1.48132102 1.17044585

Another way would be to do something similar to your maxes, such as

altsumMAX1MAX2 <- sapply(1:nrow(df), function(i){
    sum(sort(df[i,3:7], decreasing=T)[1:2])
    })

Collectives™ on Stack Overflow

Sum based of the name of variables in R

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
r
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged r or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
or ask your own question.