0

I have a csv table with a column that contains the text from a chat log. Each text row follows the same format of the name of the person and time of the message (with an additional front and back space padding) followed by the message content. An example of a single row of the text column:

'  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'

I would like to transform this single string column, into multiple columns (number of columns would depend on number of messages), with one column for each individual message like below:

  • Siri (3:15pm) Hello how can I help you
  • John Wayne (3:17pm) what day of the week is today
  • Siri (3:18pm) it is Monday

How can I parse this text in a pandas dataframe column to separate the chat logs into individual message columns?

2 Answers 2

0

If you have this dataframe:

                                                                                                                     Messages
0  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.

then you can do:

x = df["Messages"].str.split(r"\s{2,}").explode()

out = (x[::2] + " " + x[1::2]).to_frame()
print(out)

Prints:

                                            Messages
0            Siri (3:15pm) Hello how can I help you?
0  John Wayne (3:17pm) what day of the week is today
0                        Siri (3:18pm) it is Monday.

Note: It only works if there 2+ spaces between the Name and Text.

0

This is how I did it, took me a while but we got to it!

s = pd.Series(['  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'])
s = s.str.split(r"  ", expand=True)
s = s.drop(labels=[0], axis=1)
s = s.transpose()

for i in s.index:
    list_1 = list(s[0])

odd_i = []
even_i = []
for i in range(0, len(list_1)):
    if i % 2:
        even_i.append(list_1[i])
    else :
        odd_i.append(list_1[i])

d = {'Name': odd_i, 'Message': even_i}
df = pd.DataFrame(data=d)
df

Output:
                   Name                               Message
0         Siri (3:15pm)             Hello how can I help you?
1   John Wayne (3:17pm)         what day of the week is today
2         Siri (3:18pm)                         it is Monday.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.