0

I have list of columns that I want to combine into a vector. The column element could be a name or the string "0". I would like to get a list of the column's elements that have a name into a character vector called df$keywords. I have pasted an example dataframe below. I would like it to become

df$keywords[1,] would be an empty vector

df$keywords[2,] would be (ACT Science, study skills, MCAT)

Any help would be appreciated

    structure(list(V31 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L), .Label = "0", class = "factor"), V32 = structure(c(1L, 
    2L, 4L, 5L, 7L, 8L, 6L, 5L, 3L, 3L), .Label = c("0", "ACT Science", 
    "English", "Microsoft PowerPoint", "physics", "proofreading", 
    "reading", "writing"), class = "factor"), V33 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V34 = structure(c(1L, 7L, 5L, 5L, 8L, 2L, 6L, 5L, 3L, 4L), .Label = c("0", 
    "geography", "Italian", "literature", "prealgebra", "SAT reading", 
    "study skills", "trigonometry"), class = "factor"), V35 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V36 = structure(c(1L, 3L, 4L, 4L, 7L, 2L, 6L, 4L, 5L, 5L), .Label = c("0", 
    "English", "MCAT", "precalculus", "proofreading", "SAT writing", 
    "writing"), class = "factor"), V37 = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V38 = structure(c(1L, 1L, 5L, 5L, 2L, 1L, 4L, 5L, 3L, 6L), .Label = c("0", 
    "English", "GED", "physical science", "reading", "spelling"
    ), class = "factor")), .Names = c("V31", "V32", "V33", "V34", 
    "V35", "V36", "V37", "V38"), class = "data.frame", row.names = c(NA, 
    -10L))
1
  • 1
    hi, sorry, but this isn't making too much sense. What you describe as output does not match up with the sample data. Also, do you want to combine them into a vector or into a data.frame? Please consider revising your question for clarity. Commented Apr 28, 2013 at 4:55

2 Answers 2

3

Assuming your data in assigned to x then the following achieves what I think you are after:

apply(x, 1, function(r) {tmp <- unique(r); tmp[tmp != 0]})

apply works across each row of your data frame, takes the unique elements in each row and gets rid of 0 entries. The results are a list of vectors of varying lengths with the unique non-zero elements of each row.

4
  • Small typo: apply(x should be apply(tmp I think? Commented Apr 28, 2013 at 6:17
  • 3
    the more compact way of writing this would be apply(df, 1, function(r) unique(r[r != 0])) Commented Apr 28, 2013 at 6:33
  • @SlowLearner no typo!! x is the object to apply the function over. To be applicable to the OP's data it should be apply(df...) but I think sean is giving the general case. Commented Apr 28, 2013 at 6:34
  • @SimonO101 Ah, yes, I had actually named my data tmp rather than x when I looked at it so of course that worked, but I was getting the wrong end of the stick. Apologies for mudding the water and thanks for the clarification. Commented Apr 28, 2013 at 13:25
1

In first post I did not correctly understand the required output, A slightly different approach would be to use the %in% operator across rows like this:

df$keywords <- apply(df,1, function(x) c( x[! x %in% "0"]))
df$keywords
#                                                                                                            keywords
#1                                                                                                                    
#2                                                    ACT Science, study skills, MCAT, ACT Science, study skills, MCAT
#3      Microsoft PowerPoint, prealgebra, precalculus, reading, Microsoft PowerPoint, prealgebra, precalculus, reading
#4                                physics, prealgebra, precalculus, reading, physics, prealgebra, precalculus, reading
#5                                    reading, trigonometry, writing, English, reading, trigonometry, writing, English
#6                                                            writing, geography, English, writing, geography, English
#7  proofreading, SAT reading, SAT writing, physical science, proofreading, SAT reading, SAT writing, physical science
#8                                physics, prealgebra, precalculus, reading, physics, prealgebra, precalculus, reading
#9                                            English, Italian, proofreading, GED, English, Italian, proofreading, GED
#10                           English, literature, proofreading, spelling, English, literature, proofreading, spelling

And if you want the unique set of skills per row, just add in the command unique like so:

df$keywords <- apply(df,1, function(x) c( unique(x[ ! x %in% "0" ] ) ) )
df["keywords"]
#                                                  keywords
#1                                                          
#2                           ACT Science, study skills, MCAT
#3    Microsoft PowerPoint, prealgebra, precalculus, reading
#4                 physics, prealgebra, precalculus, reading
#5                   reading, trigonometry, writing, English
#6                               writing, geography, English
#7  proofreading, SAT reading, SAT writing, physical science
#8                 physics, prealgebra, precalculus, reading
#9                       English, Italian, proofreading, GED
#10              English, literature, proofreading, spelling

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.