2

Let's say that I have a df contains ID, gender, and several numerical variables, and MAX1, MAX2, and MAX3, where

MAX1 = the variable name of the first max values of x1,x2,x3,x4,x5

MAX2 = the variable name of the second max values of x1,x2,x3,x4,x5

MAX3 = the variable name of the third max values of x1,x2,x3,x4,x5

### Generate data
set.seed(123)
    ID <- c(1,2,3,4,5,6,7,8,9,10)
gender <- c("m", "m", "m", "f", "f", "m", "m", "f", "f", "m")
x1 <- rnorm(10, 0, 1)
x2 <- rnorm(10, 0, 1)
x3 <- rnorm(10, 0, 1)
x4 <- rnorm(10, 0, 1)
x5 <- rnorm(10, 0, 1)
df <- data.frame(ID, gender, x1, x2, x3, x4, x5)

maxes <- t(sapply(1:nrow(df), function(i) {
    names(sort(df[i,3:7], decreasing=T)[1:3])
}))
colnames(maxes) <- c("MAX1","MAX2", "MAX3")
df <- cbind(df, maxes)

Now I need to create a new column (call ir m_sum) that has the sum values of MAX1 and MAX2.

For example, for ID=1, MAX1 = x2 and MAX2 = x4, then m_sum the shold be equal to 1.2240818 + 0.42646422 = 1.650546.

2
  • Is there a reason to sum the two top values of each row by identifying the MAXi row names first and then trying to use the row names? I would rather transpose your data and sum the top 2 values but maybe you have simplified your example quite much so that my approach breaks other requirements...
    – R Yoda
    Commented Apr 7, 2017 at 23:04
  • Yes, i know it's not the most efficient.... but later i need to use MAX1 and MAX2...
    – user9292
    Commented Apr 7, 2017 at 23:26

2 Answers 2

3

How about using apply to do it all in one call?

df$m_sum <- apply(df, 1, function(x) as.double(x[x[ "MAX1" ]]) + as.double(x[x[ "MAX2" ]]))
 #[1] 1.65054602 0.15189652 2.45383397 3.04708946 2.02954308 3.50197809 1.39170465 0.09146139 1.48132102
#[10] 1.17044583
2
  • Elegant code! Just to make it clear for other readers who want to reuse this answer: The solution loops over all rows of the data.frame (parameter margin is 1) which is quite slow in case of many rows (but performance was no requirement of the OP so absolutely OK here).
    – R Yoda
    Commented Apr 8, 2017 at 7:55
  • Thank you. I chose this answer because it was faster than others.
    – user9292
    Commented Apr 8, 2017 at 18:20
0

This is complicated by df$MAX1 etc. using factors

but a simple loop something like

sumMAX1MAX2 <- numeric()
for (r in 1:nrow(df)){ 
    sumMAX1MAX2[r] <- df[r, as.character(df$MAX1)[r]] + 
                      df[r, as.character(df$MAX2)[r]]
    }

seems to produce

> sumMAX1MAX2
 [1] 1.65054602 0.15189655 2.45383398 3.04708945 2.02954308 3.50197812
 [7] 1.39170470 0.09146141 1.48132102 1.17044585

Another way would be to do something similar to your maxes, such as

altsumMAX1MAX2 <- sapply(1:nrow(df), function(i){
    sum(sort(df[i,3:7], decreasing=T)[1:2])
    })

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.