How to build a new column in Pandas from a Conditional (New Column should output strings)
I'm trying to create a column in pandas using a conditional to create a qualitative observation.
For example, if the data frame looks like this:
Distance 1 1 2 5 3 40 4 15
I want to create a new column (let's call it
df['length']) which is an observation on the distances.
if df[Distance] = 1: print('Short')
I want 'Short' to be input into the new column for each row that fits the conditional.
Or for example:
if df[Distance] > 10: print('Long')
I want each row that fits the conditional in the new column to be 'Long'.
How would I go about doing this?
I'm trying to write it into a function. This is what I have now:
def trip_distance(row): df = pd.read_csv('taxi_january_standard_rate.csv') if df['trip_distance'] > 50 : return "Long"
and then I try and use that to populate a new column:
df['trip_length'] = df.apply(trip_distance , axis=1)
but it doesn't seem to work. It's giving me an error:
('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
Basically, I'm trying to give 5 Qualitative descriptions to a column in a taxicab data set, where for each distance greater than a certain value, I describe it as 'Long' or if it is close to the mean, I describe it as 'Average', etc.
>>> df = pd.DataFrame(l,columns=['Distannce']) >>> df Distannce 0 1 1 5 2 40 3 15 >>> df['length'] = np.nan >>> df['length'][df['Distannce'] > 10] = 'Long' >>> df Distannce length 0 1 NaN 1 5 NaN 2 40 Long 3 15 Long >>> df['length'][df['Distannce'] == 1] = 'Short' >>> df Distannce length 0 1 Short 1 5 NaN 2 40 Long 3 15 Long >>>
Let me know if it helps, also please mark as answer if it works for you.
you need np.where
import numpy as np df['Length']=np.where(df['Distance']>10,'Long','Short')
if you want multiple conditions, go with @sacul solution, use np.select
df['length'] = np.select([df.Distance < 2, df.Distance > 10], ['short', 'long'], 'average')
Alternatively you could do:
df.loc[df['Distance'] > 10, 'length'] = 'Long' df.loc[df['Distance'] == 1, 'length'] = 'Short'
Distance length 0 1 Short 1 5 NaN 2 40 Long 3 15 Long
You can fill NaN with whatever value you want using fillna