Sometimes you are interested in setting the value of a column only if the row meets some condition. For example, if our DataFrame has two columns, accuracy
and predicted_text
, we may want to set the predicted_text
to the empty string (''
) if the accuracy is less than 50. We can do this by using .loc[]
, by doing df.loc[df.accuracy <= 50, 'predicted_text'] = ''
. The first argument to loc (df.accuracy <= 50
) creates a column of boolean (True/False) values, one for each row. This selects rows to update. The second argument is the column we want to update. .fillna()
is a convenient special case method for conditional updates when the condition is the column is NaN.
Suppose you constructed a DataFrame by
import pandas as pd df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia'], 'age': [30, 56, 8], 'city': ['New York', 'Atlanta', 'Shanghai']})
Giving you the DataFrame
name | age | city | |
---|---|---|---|
0 | Jeff | 30 | New York |
1 | Esha | 56 | Atlanta |
2 | Jia | 8 | Shanghai |
Suppose we realize after collecting a bunch of data that our process incorrectly set the age of people in New York and Atlanta one year less than it was suppose to.
Complete the function, correct_age_in_error_cities(df)
, by having it increment the age of people living in New York or Atlanta by one year.
df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Hatori', 'Ashley'], 'age': [30, 56, 8, 38, 20], 'city': ['New York', 'Atlanta', 'Shanghai', 'Tokyo', 'New York']})
name | age | city | |
---|---|---|---|
0 | Jeff | 30 | New York |
1 | Esha | 56 | Atlanta |
2 | Jia | 8 | Shanghai |
3 | Hatori | 38 | Tokyo |
4 | Ashley | 20 | New York |
name | age | city | |
---|---|---|---|
0 | Jeff | 31 | New York |
1 | Esha | 57 | Atlanta |
2 | Jia | 8 | Shanghai |
3 | Hatori | 38 | Tokyo |
4 | Ashley | 21 | New York |