Impute missing values with past mean

Exercise
Example Input and Output

Suppose we have the following dataset of the price of a bottle of wine in a store each month. Unfortunately, some months we are missing the price data.

import pandas as pd

df = pd.DataFrame({'time': pd.to_datetime(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01']), 
                   'price': [30, np.nan, 35, 32]})

	time	price
1	2011-01-01	30
2	2011-02-01	nan
3	2011-03-01	35
4	2011-04-01	32

Simply imputing with the mean over the whole dataset may not be what you want. Imputing with the overall mean leads to rows early in the dataframe having information from the future! Depending on your data and the application this lookahead may not be a problem or may be critically problematic.

For this exercise, let's assume it is problematic. Instead, we can fill missing price rows with the mean of all previous rows. Filling with the mean of all previous rows ensures the imputed value doesn't look into the future.

Task: Write a function, fillna_with_past_mean(df) which takes in the DataFrame and updates the column price so that nan rows are set to the mean price of all previous rows.

Note: One important detail is how to compute the mean over all previous rows when there are missing rows. You can simply skip missing rows when computing the mean, fill missing rows with some constant value or fill with the mean that was computed for that row. For this exercise, simply skip missing rows when computing the mean.

Example Input

Code to generate input

df = pd.DataFrame({'time': pd.to_datetime(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01', '2011-05-01']), 
                   'price': [30, np.nan, 35, np.nan, 32]})

Table generated

	time	price
0	2011-01-01 00:00:00	30
1	2011-02-01 00:00:00	nan
2	2011-03-01 00:00:00	35
3	2011-04-01 00:00:00	nan
4	2011-05-01 00:00:00	32

Example Output

	price
0	30
1	30
2	35
3	32.5
4	32

Next Exercise