combining columns in R

Question

I have list of columns that I want to combine into a vector. The column element could be a name or the string "0". I would like to get a list of the column's elements that have a name into a character vector called df$keywords. I have pasted an example dataframe below. I would like it to become

df$keywords[1,] would be an empty vector

df$keywords[2,] would be (ACT Science, study skills, MCAT)

Any help would be appreciated

    structure(list(V31 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L), .Label = "0", class = "factor"), V32 = structure(c(1L, 
    2L, 4L, 5L, 7L, 8L, 6L, 5L, 3L, 3L), .Label = c("0", "ACT Science", 
    "English", "Microsoft PowerPoint", "physics", "proofreading", 
    "reading", "writing"), class = "factor"), V33 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V34 = structure(c(1L, 7L, 5L, 5L, 8L, 2L, 6L, 5L, 3L, 4L), .Label = c("0", 
    "geography", "Italian", "literature", "prealgebra", "SAT reading", 
    "study skills", "trigonometry"), class = "factor"), V35 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V36 = structure(c(1L, 3L, 4L, 4L, 7L, 2L, 6L, 4L, 5L, 5L), .Label = c("0", 
    "English", "MCAT", "precalculus", "proofreading", "SAT writing", 
    "writing"), class = "factor"), V37 = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V38 = structure(c(1L, 1L, 5L, 5L, 2L, 1L, 4L, 5L, 3L, 6L), .Label = c("0", 
    "English", "GED", "physical science", "reading", "spelling"
    ), class = "factor")), .Names = c("V31", "V32", "V33", "V34", 
    "V35", "V36", "V37", "V38"), class = "data.frame", row.names = c(NA, 
    -10L))

hi, sorry, but this isn't making too much sense. What you describe as output does not match up with the sample data. Also, do you want to combine them into a vector or into a data.frame? Please consider revising your question for clarity. — Ricardo Saporta, Commented Apr 28, 2013 at 4:55

seancarmody · Accepted Answer · 2013-04-28 05:40:28Z

3

Assuming your data in assigned to x then the following achieves what I think you are after:

apply(x, 1, function(r) {tmp <- unique(r); tmp[tmp != 0]})

apply works across each row of your data frame, takes the unique elements in each row and gets rid of 0 entries. The results are a list of vectors of varying lengths with the unique non-zero elements of each row.

answered Apr 28, 2013 at 5:40

seancarmody

6,2702 gold badges35 silver badges31 bronze badges

Small typo: apply(x should be apply(tmp I think?
– SlowLearner
Commented Apr 28, 2013 at 6:17
3

the more compact way of writing this would be apply(df, 1, function(r) unique(r[r != 0]))
– Simon O'Hanlon
Commented Apr 28, 2013 at 6:33
@SlowLearner no typo!! x is the object to apply the function over. To be applicable to the OP's data it should be apply(df...) but I think sean is giving the general case.
– Simon O'Hanlon
Commented Apr 28, 2013 at 6:34
@SimonO101 Ah, yes, I had actually named my data tmp rather than x when I looked at it so of course that worked, but I was getting the wrong end of the stick. Apologies for mudding the water and thanks for the clarification.
– SlowLearner
Commented Apr 28, 2013 at 13:25

Add a comment |

Simon O'Hanlon · Accepted Answer · 2013-04-28 06:29:37Z

In first post I did not correctly understand the required output, A slightly different approach would be to use the %in% operator across rows like this:

df$keywords <- apply(df,1, function(x) c( x[! x %in% "0"]))
df$keywords
#                                                                                                            keywords
#1                                                                                                                    
#2                                                    ACT Science, study skills, MCAT, ACT Science, study skills, MCAT
#3      Microsoft PowerPoint, prealgebra, precalculus, reading, Microsoft PowerPoint, prealgebra, precalculus, reading
#4                                physics, prealgebra, precalculus, reading, physics, prealgebra, precalculus, reading
#5                                    reading, trigonometry, writing, English, reading, trigonometry, writing, English
#6                                                            writing, geography, English, writing, geography, English
#7  proofreading, SAT reading, SAT writing, physical science, proofreading, SAT reading, SAT writing, physical science
#8                                physics, prealgebra, precalculus, reading, physics, prealgebra, precalculus, reading
#9                                            English, Italian, proofreading, GED, English, Italian, proofreading, GED
#10                           English, literature, proofreading, spelling, English, literature, proofreading, spelling

And if you want the unique set of skills per row, just add in the command unique like so:

df$keywords <- apply(df,1, function(x) c( unique(x[ ! x %in% "0" ] ) ) )
df["keywords"]
#                                                  keywords
#1                                                          
#2                           ACT Science, study skills, MCAT
#3    Microsoft PowerPoint, prealgebra, precalculus, reading
#4                 physics, prealgebra, precalculus, reading
#5                   reading, trigonometry, writing, English
#6                               writing, geography, English
#7  proofreading, SAT reading, SAT writing, physical science
#8                 physics, prealgebra, precalculus, reading
#9                       English, Italian, proofreading, GED
#10              English, literature, proofreading, spelling

Collectives™ on Stack Overflow

combining columns in R

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
r
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged rdataframe or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dataframe
or ask your own question.