flattening pandas columns in a non-trivial way

Question

I have a pandas dataframe which looks like the following:

    site    pay delta   over    under
phase   a   a   b           
ID                      
D01 London  12.3    10.3    -2.0    0.0 -2.0
D02 Bristol 7.3 13.2    5.9 5.9 0.0
D03 Bristol 17.3    19.2    1.9 1.9 0.0

I'd like to flatten the column multindex to the columns are

ID      site    a   b   delta   over    under                   
D01 London  12.3    10.3    -2.0    0.0 -2.0
D02 Bristol 7.3 13.2    5.9 5.9 0.0
D03 Bristol 17.3    19.2    1.9 1.9 0.0

I'm struggling with the online documentation and tutorials to work out how to do this.

I'd welcome advice, ideally to do this in a robust way which doesn't hardcode column positions.

UPDATE: the to_dict is

{'index': ['D01', 'D02', 'D03'],
 'columns': [('site', 'a'),
  ('pay', 'a'),
  ('pay', 'b'),
  ('delta', ''),
  ('over', ''),
  ('under', '')],
 'data': [['London', 12.3, 10.3, -2.0, 0.0, -2.0],
  ['Bristol', 7.3, 13.2, 5.8999999999999995, 5.8999999999999995, 0.0],
  ['Bristol', 17.3, 19.2, 1.8999999999999986, 1.8999999999999986, 0.0]],
 'index_names': ['ID'],
 'column_names': [None, 'phase']}

please provide a reproducible format of your input (for example the output of df.to_dict('tight')) — mozway, Commented Nov 28 at 15:26
Thanks, now what is the logic? Why do you keep site and not pay for example? — mozway, Commented Nov 28 at 15:30

samhita · Accepted Answer · 2024-11-28 15:47:05Z

As the above answer mentioned the requirement is not clear, I have tried out just the flattening of the data, let us know

import pandas as pd

data = {
    'index': ['D01', 'D02', 'D03'],
    'columns': [('site', 'a'),
                ('pay', 'a'),
                ('pay', 'b'),
                ('delta', ''),
                ('over', ''),
                ('under', '')],
    'data': [['London', 12.3, 10.3, -2.0, 0.0, -2.0],
             ['Bristol', 7.3, 13.2, 5.9, 5.9, 0.0],
             ['Bristol', 17.3, 19.2, 1.9, 1.9, 0.0]],
    'index_names': ['ID'],
    'column_names': [None, 'phase']
}

# Construct the MultiIndex columns and DataFrame
multiindex_columns = pd.MultiIndex.from_tuples(data['columns'], names=data['column_names'])
df = pd.DataFrame(data['data'], index=data['index'], columns=multiindex_columns)


# Flattening the MultiIndex columns
df.columns = [' '.join(filter(None, col)) for col in df.columns]

print(df)

Flattened DataFrame:

      site a  pay a  pay b  delta  over  under
D01   London  12.30  10.30  -2.00  0.00  -2.00
D02  Bristol   7.30  13.20   5.90  5.90   0.00
D03  Bristol  17.30  19.20   1.90  1.90   0.00

mozway · Accepted Answer · 2024-11-28 15:38:58Z

The expected logic is unclear, but assuming you want to keep the top level if the value is not "pay":

# get column levels
lvl0 = df.columns.get_level_values(0)
lvl1 = df.columns.get_level_values('phase')

# keep lvl0 if the value is not pay
df.columns = np.where(lvl0 != 'pay', lvl0, lvl1)

# optionally reset the index
df.reset_index(inplace=True)

Another example, assuming you want to keep the top level if the values are not duplicated of if the second level is null:

# get column levels
lvl0 = df.columns.get_level_values(0)
lvl1 = df.columns.get_level_values('phase')

# is the second level ''?
m1 = lvl1==''
# is the first level not duplicated
m2 = ~lvl0.duplicated(keep=False)

# keep lvl0 if either m1/m2 is True, else lvl1
df.columns = np.where(m1|m2, lvl0, lvl1)

df.reset_index(inplace=True)

Output:

    ID     site     a     b  delta  over  under
0  D01   London 12.30 10.30  -2.00  0.00  -2.00
1  D02  Bristol  7.30 13.20   5.90  5.90   0.00
2  D03  Bristol 17.30 19.20   1.90  1.90   0.00

Collectives™ on Stack Overflow

flattening pandas columns in a non-trivial way

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframe or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.