I have a Pandas dataframe which I am grouping by one of the columns in the dataframe. Then I have a function which needs to be applied once to each group. The function modifies the group. So what I want to do is something like this:
df = pd.DataFrame([[1, 'coupe', 2004], [2, 'sedan', 2004], [1, 'sedan', 2015], [3, 'coupe', 2010]], columns=['group_id', 'model', 'year_manufactured'])
group_id | model | year_manufactured | |
---|---|---|---|
0 | 1 | coupe | 2004 |
1 | 2 | sedan | 2004 |
2 | 1 | sedan | 2015 |
3 | 3 | coupe | 2010 |
def my_function(group):
if 'coupe' in group['model']:
group['model'].replace('coupe', 'hatchback', inplace=True)
group['year_manufactured'].replace(2004, 2005, inplace=True)
else:
newest_car = group['year_manufactured'].idxmax()
newest_car_type = group.loc[newest_car, 'model']
group['model'] = group['model'].map(lambda x : x + newest_car_type)
df.groupby('group_id').map(my_function)
However, map can't be applied to a groupby. Is there any way I can work around this while not compromising speed. Also the function I have can't just be mapped to each row of the database since the way it acts on one row is dependent on previous rows.
apply
or map with a function.my_function
do, there's probably an alternative