1

The columns in my dataframe contain tuples and empty cells. The number of columns in the dataframe is dynamic and the columns have no lables.

          **0                1               2**
**0**    name,score,ID    name,score,ID      None
**1**    name,score,ID       None            None
**2**        None            None         name,score,ID

I want to split the tuples of all columns in separate columns, i.e:

      **Name0    Score0    ID0    Name1   Score1    ID1    Name2   Score2   ID2**
**0**    name    score     ID     name    score     ID      None    None    None
**1**    name    score     ID     None    None     None     None    None    None
**2**    None    None     None    None    None     None     name    score     ID

I found the following:

df1[['Name', 'Score', "ID"]] = pd.DataFrame(df1[0].tolist(), index=df1.index)

which works basically, however it just splits the first columns tuples into separate columns (--> df1[0]). I cant find how to apply this to all columns tuples.

Any help appreciated!

1 Answer 1

1

Considering the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        0: {
            0: None,
            1: None,
            2: None,
            3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
            4: (
                "ait austrian institute of technology gmbh giefinggasse 4 wien",
                70,
                537,
            ),
        },
        1: {0: None, 1: None, 2: None, 3: None, 4: None},
        2: {0: None, 1: None, 2: None, 3: None, 4: None},
    }
)

print(df)
# Outputs
                                                   0     1     2
0                                               None  None  None
1                                               None  None  None
2                                               None  None  None
3  (bartenbach gmbh rinner strasse 14 aldrans, 96...  None  None
4  (ait austrian institute of technology gmbh gie...  None  None

You could iterate on each column, then each value, split the string and populate a new dataframe, like this:

new_df = pd.DataFrame()

for col_num, series in df.iteritems():
    for i, value in enumerate(series.values):
        try:
            name, score, id_num = value
            new_df.loc[i, f"Name{col_num}"] = name
            new_df.loc[i, f"Score{col_num}"] = score
            new_df.loc[i, f"ID{col_num}"] = id_num
        except TypeError:
            continue
new_df = new_df.reset_index(drop=True)

print(new_df)
# Outputs
                                               Name0  Score0     ID0
0          bartenbach gmbh rinner strasse 14 aldrans    96.0  1050.0
1  ait austrian institute of technology gmbh gief...    70.0   537.0
4
  • With me this results in an empty new_df. Without try-block it results in AttributeError: 'NoneType' object has no attribute 'split'.
    – konstantin
    Commented Oct 4, 2021 at 8:36
  • You can run my code as it is and see that it works. You need the try/except statement to deal with empty cells. So, I suspect your real datraframe differs from the one you gave as an example in your question. What is the output of df.head().to_dict() ?
    – Laurent
    Commented Oct 4, 2021 at 12:49
  • {0: {0: None, 1: None, 2: None, 3: ('bartenbach gmbh rinner strasse 14 aldrans', 96, 1050), 4: ('ait austrian institute of technology gmbh giefinggasse 4 wien', 70, 537)}, 1: {0: None, 1: None, 2: None, 3: None, 4: None}, 2: {0: None, 1: None, 2: None, 3: None, 4: None}, 3: {0: None, 1: None, 2: None, 3: None, 4: None}, 4: {0: None, 1: None, 2: None, 3: None, 4: None}}
    – konstantin
    Commented Oct 5, 2021 at 13:59
  • Based on these new specs, I've updated my answer, everything should run just fine now.
    – Laurent
    Commented Oct 5, 2021 at 15:39

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.