How to build a new column in Pandas from a Conditional (New Column should output strings)

I'm trying to create a column in pandas using a conditional to create a qualitative observation.

For example, if the data frame looks like this:

      Distance      
1     1              
2     5                        
3     40              
4     15 

I want to create a new column (let's call it df['length']) which is an observation on the distances.

For example:

if df[Distance] = 1:
  print('Short')

I want 'Short' to be input into the new column for each row that fits the conditional.

Or for example:

if df[Distance] > 10:
  print('Long')

I want each row that fits the conditional in the new column to be 'Long'.

How would I go about doing this?

I'm trying to write it into a function. This is what I have now:

def trip_distance(row):    

    df = pd.read_csv('taxi_january_standard_rate.csv')

    if df['trip_distance'] > 50 :
        return "Long"

and then I try and use that to populate a new column:

df['trip_length'] = df.apply(trip_distance , axis=1)

but it doesn't seem to work. It's giving me an error:

('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

Basically, I'm trying to give 5 Qualitative descriptions to a column in a taxicab data set, where for each distance greater than a certain value, I describe it as 'Long' or if it is close to the mean, I describe it as 'Average', etc.

3 answers

  • answered 2018-07-11 04:08 Mr. J

    >>> df = pd.DataFrame(l,columns=['Distannce'])
    >>> df
       Distannce
    0          1
    1          5
    2         40
    3         15
    
    >>> df['length'] = np.nan
    >>> df['length'][df['Distannce'] > 10] = 'Long'
    >>> df
       Distannce length
    0          1    NaN
    1          5    NaN
    2         40   Long
    3         15   Long
    >>> df['length'][df['Distannce'] == 1] = 'Short'
    >>> df
       Distannce length
    0          1  Short
    1          5    NaN
    2         40   Long
    3         15   Long
    >>> 
    

    Let me know if it helps, also please mark as answer if it works for you.

  • answered 2018-07-11 04:26 pyd

    you need np.where

     import numpy as np
     df['Length']=np.where(df['Distance']>10,'Long','Short')
    

    if you want multiple conditions, go with @sacul solution, use np.select

    df['length'] = np.select([df.Distance < 2, df.Distance > 10], ['short', 'long'], 'average')
    

  • answered 2018-07-11 05:22 min2bro

    Alternatively you could do:

    df.loc[df['Distance'] > 10, 'length'] = 'Long'
    df.loc[df['Distance'] == 1, 'length'] = 'Short'
    

    Output:

       Distance length
    0   1      Short
    1   5      NaN
    2   40     Long
    3   15     Long
    

    You can fill NaN with whatever value you want using fillna