Imputing data is replacing missing data with substituted values. Missing data is typically represented by a value nan (not a number). Keep in mind, depending on the dataset, missing values can be represented differently.
You can replace nan values using the function fillna
Using the DataFrame:
import pandas as pd df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Bobby'], 'age': [30, 56, 8, np.nan]})
| name | age | |
|---|---|---|
| 1 | Jeff | 30 |
| 2 | Esha | 56 |
| 3 | Jia | 8 |
| 4 | Bobby | nan |
Write a function, fillna_age_with_mean(df) which takes in the DataFrame and updates the column age so that nan rows are set to the mean age of all the rows.
df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Bobby'], 'age': [30, 56, 8, np.nan]})
| name | age | |
|---|---|---|
| 0 | Jeff | 30 |
| 1 | Esha | 56 |
| 2 | Jia | 8 |
| 3 | Bobby | nan |
| age | |
|---|---|
| 0 | 30 |
| 1 | 56 |
| 2 | 8 |
| 3 | 31.3333 |