Customer Retention Over Time

Exercise
Example Input and Output

Building off the previous retention exercise, now we want to calculate customer retention for each month instead of just the last month. Looking at retention over time helps to understand how the business is progressing.

Suppose we have a table with four columns, ['date', 'customer_id', 'order_num', 'order_amount_dollars'].

	date	customer_id	order_num	order_amount_dollars
1	2019-08-01	1	1	30
2	2019-08-05	2	2	40
3	2019-09-01	1	3	30
4	2019-09-07	3	4	50

We are interested in calculating the monthly customer retention between the most recent month and the month prior.

Task: Write a function customer_retention(df) which takes a dataframe as above and returns a pandas Series with the retention for each month starting with the second month in the DataFrame.

As a reminder, customer retention for a subscription service or website is the % of customers that purchased or logged on the previous time period that also purchased or logged on in the current time period. Depending on the application, the time period might be measured in days, weeks or months.

customer_retention = num_customers[t] / num_customers[t-1]

When doing this calculation, you need to be careful to filter out new customers in the current time period.

Example Input

Code to generate input

df = pd.DataFrame({'date': pd.to_datetime(['2019-01-01', '2019-01-05', '2019-02-01', '2019-02-07', '2019-02-09', '2019-03-01']),
                   'customer_id': [1, 2, 1, 3, 4, 1],
                   'order_number': [709871, 709872, 709874, 709875, 709877, 709876],
                   'order_amount_dollars': [30, 20, 30, 80, 50, 30]})

Table generated

	date	customer_id	order_number	order_amount_dollars
0	2019-01-01 00:00:00	1	709871	30
1	2019-01-05 00:00:00	2	709872	20
2	2019-02-01 00:00:00	1	709874	30
3	2019-02-07 00:00:00	3	709875	80
4	2019-02-09 00:00:00	4	709877	50
5	2019-03-01 00:00:00	1	709876	30

Example Output

yearmonth	retention
201902	0.5
201903	0.333333

Next Exercise