Conditional column update

Sometimes you are interested in setting the value of a column only if the row meets some condition. For example, if our DataFrame has two columns, accuracy and predicted_text, we may want to set the predicted_text to the empty string ('') if the accuracy is less than 50. We can do this by using .loc[], by doing df.loc[df.accuracy <= 50, 'predicted_text'] = ''. The first argument to loc (df.accuracy <= 50) creates a column of boolean (True/False) values, one for each row. This selects rows to update. The second argument is the column we want to update. .fillna() is a convenient special case method for conditional updates when the condition is the column is NaN.

Suppose you constructed a DataFrame by

import pandas as pd

df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia'], 
                   'age': [30, 56, 8],
                   'city': ['New York', 'Atlanta', 'Shanghai']})

Giving you the DataFrame

name age city
0 Jeff 30 New York
1 Esha 56 Atlanta
2 Jia 8 Shanghai

Suppose we realize after collecting a bunch of data that our process incorrectly set the age of people in New York and Atlanta one year less than it was suppose to. Complete the function, correct_age_in_error_cities(df), by having it increment the age of people living in New York or Atlanta by one year.

Example Input

Code to generate input

df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Hatori', 'Ashley'], 
                   'age': [30, 56, 8, 38, 20],
                   'city': ['New York', 'Atlanta', 'Shanghai', 'Tokyo', 'New York']})


Table generated

name age city
0 Jeff 30 New York
1 Esha 56 Atlanta
2 Jia 8 Shanghai
3 Hatori 38 Tokyo
4 Ashley 20 New York

Example Output

name age city
0 Jeff 31 New York
1 Esha 57 Atlanta
2 Jia 8 Shanghai
3 Hatori 38 Tokyo
4 Ashley 21 New York