Customer retention for a subscription service or website is the % of customers that purchased or logged on the previous time period that also purchased or logged on in the current time period. Depending on the application, the time period might be measured in days, weeks or months.
customer_retention = num_customers[t] / num_customers[t-1]
When doing this calculation, you need to be careful to filter out new customers in the current time period.
Suppose we have a table with four columns, ['date', 'customer_id', 'order_num', 'order_amount_dollars']
.
date | customer_id | order_num | order_amount_dollars | |
---|---|---|---|---|
1 | 2019-08-01 | 1 | 1 | 30 |
2 | 2019-08-05 | 2 | 2 | 40 |
3 | 2019-09-01 | 1 | 3 | 30 |
4 | 2019-09-07 | 3 | 4 | 50 |
We are interested in calculating the monthly customer retention between the most recent month and the month prior.
Write a function customer_retention(df)
which takes a dataframe as above and returns a single number representing the most recent monthly customer retention.
df = pd.DataFrame({'date': pd.to_datetime(['2019-01-01', '2019-01-05', '2019-02-01', '2019-02-07']), 'customer_id': [1, 2, 1, 3], 'order_num': [709871, 709872, 709874, 709875], 'order_amount_dollars': [30, 20, 30, 80]})
date | customer_id | order_num | order_amount_dollars | |
---|---|---|---|---|
0 | 2019-01-01 00:00:00 | 1 | 709871 | 30 |
1 | 2019-01-05 00:00:00 | 2 | 709872 | 20 |
2 | 2019-02-01 00:00:00 | 1 | 709874 | 30 |
3 | 2019-02-07 00:00:00 | 3 | 709875 | 80 |
0.5