i have a problem on python working with a pandas dataframe i'm trying to make a machine learning model predictin the surface . I have the surface column in the train dataframe and i don't have it in the test dataframe . So , i would to create some features based on the surface in the train like .
train['error_cat1'] = abs(train.groupby(train['cat1'])['surface'].transform('mean') - train.surface.mean())
here i have set the values of grouby by "cat" feature with the mean of suface . Cool
now i must add it to the test too . So , will use this method to map the values from the train for each groupby to the test row .
mp = {k: g['error_cat1'].tolist()[0] for k,g in train.groupby('cat1')}
test['error_cat1'] = test['cat1'].map(mp)
So , far there is no problem . Now , i would use two columns in groupby .
train['error_cat1_cat2'] = abs(train.groupby(train[['cat1','cat2']])['surface'].transform('mean') - train.surface.mean())
but i don't know how to map it for test dataframe . Please can you help me handling this problem or give me some other methods so i can do it .
Thanks
for example my train is
+------+------+-------+
| Cat1 | Cat2 | surface |
+------+------+-------+
| 1 | 3 | 10 |
+------+------+-------+
| 2 | 2 | 12 |
+------+------+-------+
| 3 | 1 | 12 |
+------+------+-------+
| 1 | 3 | 5 |
+------+------+-------+
| 2 | 2 | 10 |
+------+------+-------+
| 3 | 2 | 13 |
+------+------+-------+
my test is
+------+------+
| Cat1 | Cat2 |
+------+------+
| 1 | 2 |
+------+------+
| 2 | 1 |
+------+------+
| 3 | 1 |
+------+------+
| 1 | 3 |
+------+------+
| 2 | 3 |
+------+------+
| 3 | 1 |
+------+------+
Now i would do a groupby mean surface on the cat1 and cat2 for example the mean surface on (cat1,cat2)=(1,3) is (10+5)/2 = 7.5
Now , i must go to the test and map this value on the (cat1,cat2)=(1,3) rows .
i hope that you have got me .