Imputing data is replacing missing data with substituted values. Missing data is typically represented by a value nan
(not a number). Keep in mind, depending on the dataset, missing values can be represented differently.
You can replace nan
values using the function fillna
Using the DataFrame:
import pandas as pd df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Bobby'], 'age': [30, 56, 8, np.nan]})
name | age | |
---|---|---|
1 | Jeff | 30 |
2 | Esha | 56 |
3 | Jia | 8 |
4 | Bobby | nan |
Write a function, fillna_age_with_mean(df)
which takes in the DataFrame and updates the column age
so that nan
rows are set to the mean age of all the rows.
df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Bobby'], 'age': [30, 56, 8, np.nan]})
name | age | |
---|---|---|
0 | Jeff | 30 |
1 | Esha | 56 |
2 | Jia | 8 |
3 | Bobby | nan |
age | |
---|---|
0 | 30 |
1 | 56 |
2 | 8 |
3 | 31.3333 |