Building off the previous retention exercise, now we want to calculate customer retention for each month instead of just the last month. Looking at retention over time helps to understand how the business is progressing.
Suppose we have a table with four columns, ['date', 'customer_id', 'order_num', 'order_amount_dollars']
.
date | customer_id | order_num | order_amount_dollars | |
---|---|---|---|---|
1 | 2019-08-01 | 1 | 1 | 30 |
2 | 2019-08-05 | 2 | 2 | 40 |
3 | 2019-09-01 | 1 | 3 | 30 |
4 | 2019-09-07 | 3 | 4 | 50 |
We are interested in calculating the monthly customer retention between the most recent month and the month prior.
Task: Write a function customer_retention(df)
which takes a dataframe as above and returns a pandas Series with the retention for each month starting with the second month in the DataFrame.
As a reminder, customer retention for a subscription service or website is the % of customers that purchased or logged on the previous time period that also purchased or logged on in the current time period. Depending on the application, the time period might be measured in days, weeks or months.
customer_retention = num_customers[t] / num_customers[t-1]
When doing this calculation, you need to be careful to filter out new customers in the current time period.
df = pd.DataFrame({'date': pd.to_datetime(['2019-01-01', '2019-01-05', '2019-02-01', '2019-02-07', '2019-02-09', '2019-03-01']), 'customer_id': [1, 2, 1, 3, 4, 1], 'order_number': [709871, 709872, 709874, 709875, 709877, 709876], 'order_amount_dollars': [30, 20, 30, 80, 50, 30]})
date | customer_id | order_number | order_amount_dollars | |
---|---|---|---|---|
0 | 2019-01-01 00:00:00 | 1 | 709871 | 30 |
1 | 2019-01-05 00:00:00 | 2 | 709872 | 20 |
2 | 2019-02-01 00:00:00 | 1 | 709874 | 30 |
3 | 2019-02-07 00:00:00 | 3 | 709875 | 80 |
4 | 2019-02-09 00:00:00 | 4 | 709877 | 50 |
5 | 2019-03-01 00:00:00 | 1 | 709876 | 30 |
yearmonth | retention |
---|---|
201902 | 0.5 |
201903 | 0.333333 |