How to build a new column in Pandas from a Conditional (New Column should output strings)
I'm trying to create a column in pandas using a conditional to create a qualitative observation.
For example, if the data frame looks like this:
Distance
1 1
2 5
3 40
4 15
I want to create a new column (let's call it df['length']
) which is an observation on the distances.
For example:
if df[Distance] = 1:
print('Short')
I want 'Short' to be input into the new column for each row that fits the conditional.
Or for example:
if df[Distance] > 10:
print('Long')
I want each row that fits the conditional in the new column to be 'Long'.
How would I go about doing this?
I'm trying to write it into a function. This is what I have now:
def trip_distance(row):
df = pd.read_csv('taxi_january_standard_rate.csv')
if df['trip_distance'] > 50 :
return "Long"
and then I try and use that to populate a new column:
df['trip_length'] = df.apply(trip_distance , axis=1)
but it doesn't seem to work. It's giving me an error:
('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
Basically, I'm trying to give 5 Qualitative descriptions to a column in a taxicab data set, where for each distance greater than a certain value, I describe it as 'Long' or if it is close to the mean, I describe it as 'Average', etc.
3 answers

>>> df = pd.DataFrame(l,columns=['Distannce']) >>> df Distannce 0 1 1 5 2 40 3 15 >>> df['length'] = np.nan >>> df['length'][df['Distannce'] > 10] = 'Long' >>> df Distannce length 0 1 NaN 1 5 NaN 2 40 Long 3 15 Long >>> df['length'][df['Distannce'] == 1] = 'Short' >>> df Distannce length 0 1 Short 1 5 NaN 2 40 Long 3 15 Long >>>
Let me know if it helps, also please mark as answer if it works for you.

you need np.where
import numpy as np df['Length']=np.where(df['Distance']>10,'Long','Short')
if you want multiple conditions, go with @sacul solution, use np.select
df['length'] = np.select([df.Distance < 2, df.Distance > 10], ['short', 'long'], 'average')

Alternatively you could do:
df.loc[df['Distance'] > 10, 'length'] = 'Long' df.loc[df['Distance'] == 1, 'length'] = 'Short'
Output:
Distance length 0 1 Short 1 5 NaN 2 40 Long 3 15 Long
You can fill NaN with whatever value you want using fillna