Conditional column update

Exercise
Example Input and Output

Sometimes you are interested in setting the value of a column only if the row meets some condition. For example, if our DataFrame has two columns, accuracy and predicted_text, we may want to set the predicted_text to the empty string ('') if the accuracy is less than 50. We can do this by using .loc[], by doing df.loc[df.accuracy <= 50, 'predicted_text'] = ''. The first argument to loc (df.accuracy <= 50) creates a column of boolean (True/False) values, one for each row. This selects rows to update. The second argument is the column we want to update. .fillna() is a convenient special case method for conditional updates when the condition is the column is NaN.

Suppose you constructed a DataFrame by

import pandas as pd

df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia'], 
                   'age': [30, 56, 8],
                   'city': ['New York', 'Atlanta', 'Shanghai']})

Giving you the DataFrame

	name	age	city
0	Jeff	30	New York
1	Esha	56	Atlanta
2	Jia	8	Shanghai

Suppose we realize after collecting a bunch of data that our process incorrectly set the age of people in New York and Atlanta one year less than it was suppose to. Complete the function, correct_age_in_error_cities(df), by having it increment the age of people living in New York or Atlanta by one year.

Example Input

Code to generate input

df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Hatori', 'Ashley'], 
                   'age': [30, 56, 8, 38, 20],
                   'city': ['New York', 'Atlanta', 'Shanghai', 'Tokyo', 'New York']})

Table generated

	name	age	city
0	Jeff	30	New York
1	Esha	56	Atlanta
2	Jia	8	Shanghai
3	Hatori	38	Tokyo
4	Ashley	20	New York

Example Output

	name	age	city
0	Jeff	31	New York
1	Esha	57	Atlanta
2	Jia	8	Shanghai
3	Hatori	38	Tokyo
4	Ashley	21	New York

Next Exercise