1

I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When I used:

from sklearn.metrics import mean_absolute_error

and selected these columns, it gave me an error: "Input contains NaN'.

As an example, I'm trying to do something like this:

from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7, 10]
y_pred = [2.5, np.NaN, 2, 8, np.NaN]
mean_absolute_error(y_true, y_pred)

Is it possible to skip or ignore the rows with NaN?

UPDATE

I was analyzing with my advisor teacher, and we decided that the best is to drop all these NaN values.

10
  • Can't you just drop NaN's? pandas.pydata.org/docs/reference/api/… Commented Nov 12 at 19:25
  • I can't do it. The first column represents date, and columns 2 and 3 have specific values for each date.
    – Daniel M M
    Commented Nov 12 at 19:30
  • you can just fill the NaN with any value using fillna()
    – iBeMeltin
    Commented Nov 12 at 19:45
  • If I use fillna, it'll change the MAE value
    – Daniel M M
    Commented Nov 12 at 19:48
  • If you can't drop NaNs, you can try to interpolate them. However, you should be careful in terms of the implementation method - if you have multiple NaNs it can impact the MAE signifficantly.
    – yellow_dot
    Commented Nov 12 at 20:03

1 Answer 1

1

If you want to ignore the NaNs, build a mask a perform boolean indexing:

from sklearn.metrics import mean_absolute_error
import numpy as np

y_true = np.array([3, -0.5, 2, 7, 10])
y_pred = np.array([2.5, np.NaN, 2, 8, np.NaN])
m = ~np.isnan(y_pred)

mean_absolute_error(y_true[m], y_pred[m])

Output: 0.5

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.