Customer Retention Over Time

Building off the previous retention exercise, now we want to calculate customer retention for each month instead of just the last month. Looking at retention over time helps to understand how the business is progressing.

Suppose we have a table with four columns, ['date', 'customer_id', 'order_num', 'order_amount_dollars'].

date customer_id order_num order_amount_dollars
1 2019-08-01 1 1 30
2 2019-08-05 2 2 40
3 2019-09-01 1 3 30
4 2019-09-07 3 4 50

We are interested in calculating the monthly customer retention between the most recent month and the month prior.

Task: Write a function customer_retention(df) which takes a dataframe as above and returns a pandas Series with the retention for each month starting with the second month in the DataFrame.

As a reminder, customer retention for a subscription service or website is the % of customers that purchased or logged on the previous time period that also purchased or logged on in the current time period. Depending on the application, the time period might be measured in days, weeks or months.

customer_retention = num_customers[t] / num_customers[t-1]

When doing this calculation, you need to be careful to filter out new customers in the current time period.

Example Input

Code to generate input

df = pd.DataFrame({'date': pd.to_datetime(['2019-01-01', '2019-01-05', '2019-02-01', '2019-02-07', '2019-02-09', '2019-03-01']),
                   'customer_id': [1, 2, 1, 3, 4, 1],
                   'order_number': [709871, 709872, 709874, 709875, 709877, 709876],
                   'order_amount_dollars': [30, 20, 30, 80, 50, 30]})


Table generated

date customer_id order_number order_amount_dollars
0 2019-01-01 00:00:00 1 709871 30
1 2019-01-05 00:00:00 2 709872 20
2 2019-02-01 00:00:00 1 709874 30
3 2019-02-07 00:00:00 3 709875 80
4 2019-02-09 00:00:00 4 709877 50
5 2019-03-01 00:00:00 1 709876 30

Example Output

yearmonth retention
201902 0.5
201903 0.333333