0

Lets say I've got following data (about 50k of rows in real example)

A
B
C
D
E
X
A
B
C
D
E
F
G
H
X

And I want it to look

A,B,C,D,E,X

A,B,C,D,E,F,G,H,X

So technically I want to transpose data, but cut at particular row.

How can this be achieved in Excel, R, SQL or python?

7
  • Excel has a paste transposed feature built in. You can perhaps built a macro on top of that Commented Jan 20, 2014 at 3:51
  • This is extremely broad. What have you tried? This is a site where we are more than happy to help you when you get stuck on something, but this presumes you've tried something Commented Jan 20, 2014 at 20:33
  • I've tried with Wrangler, but couldn't complete the whole dataset on it superuser.com/questions/695336/…
    – Velletti
    Commented Jan 21, 2014 at 2:04
  • What format is this in? Text file? XLS?
    – Excellll
    Commented Jan 21, 2014 at 15:10
  • Its csv so it can be anything, its one column, and many rows.
    – Velletti
    Commented Jan 21, 2014 at 16:03

1 Answer 1

1

Using R here are a few possible answers with slight differences since I'm not sure exactly what you want

# Just a step to read in an extended version of your sample data

dat <- as.matrix(read.table(text=
"A
B
C
D
E
A
B
C
D
E
F
A
B
C
D
E
F
G
H
A
B
C
D
E
F"))

This is one way you could do the splitting. You could make an index by which to split the groups. Then split the lines based on the groupings.

splitgrp <- cumsum(ave(dat=="A", dat)) # group index
splitlist <-split(dat,splitgrp) # if you want a list

You can then make that list into different kinds of objects if you want, like so:

vecofstrings <- sapply(splitlist,paste0,collapse="") # if you want a vector
df <- data.frame(vecofstrings) # if you want a data frame
mat <- matrix(vecofstrings) # if you want a matrix

Finally, here are a few ways to save those objects:

write.table(mat,"mat.csv")
write.table(mat,"mat.csv", quote=F, row.names=F)

# Here are a few ways to save a data frame.
write.table(df,"df.txt")
write.table(df,"df.txt", quote=F)  # no quotes in the saved file
write.table(df,"df.txt", row.names=F)  # no row names in the saved file
write.table(df,"df.txt", row.names=F, col.names=F)  # no row or column names in the saved file
write.table(df,"df.txt",row.names=F, col.names=F, quote=F)  # no row or columns names and no quotes in the saved file
8
  • Wow, Frank, thank You so much. Its very helpful, however the only problem is that You've fixed ncol at 5, is there any way to make it dynamic and based on a string/number? Because data does not follow exact pattern.
    – Velletti
    Commented Feb 16, 2014 at 11:13
  • @Velletti you can give ncol an argument that detects the length of your pattern (i.e. how long each row should be). How you do that is going to depend on your situation. In this situation, I could have put length(unique(dat)) in place of the 5.
    – Jota
    Commented Feb 16, 2014 at 16:41
  • Frank, thank you for effort. Ill try to dynamically figure out the length and then pass it to function. If You happen to figure out idea, how to do it based on specific string rather than number, I would be most thankful.
    – Velletti
    Commented Feb 18, 2014 at 9:03
  • @Velletti You're Welcome. That is what I meant to show in my comment, but I have edited my answer to include the dynamic part (i.e. matrix(dat,ncol= length(unique(dat)) ,byrow=T)). If this answer was helpful, consider clicking the check mark and / or vote on the answer. Cheers
    – Jota
    Commented Feb 18, 2014 at 16:08
  • Frank, I think I am not following, because since 1st solution is ideal if You add F to the list of letters at the end(meaning dynamic range), the code breaks. And this is kind of data I have, first 4, then 7, then 9, then 4, then 6.. and so on. I've edited the question to make it more clear. Sorry if i wasn't.
    – Velletti
    Commented Feb 18, 2014 at 21:45

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .