Return dataframe rows where the values in a column are not of type date

I have a dataframe df that looks like:

 Name   Date of birth
  Bob   
Steve      22/07/1963
   Jo          pencil
Karen      03/02/1953
Frank      29/09/1994

Is there a way to return rows where Date of birth is not a date?

In the above example I would have returned:

 Name   Date of birth
  Bob   
   Jo          pencil

Where Date of birth is not a date.

I can identify where there is a blank value for Date of birth using:

missingDoBError = df.loc[df['Date of birth'].isnull()]

I have tried to find Date of birth values where the value is not a date format at set to NaT by using:

if pd.to_datetime(df['Date of birth'], format='%d-%b-%Y', errors='coerce').notnull().all():

But I can't get this to work.

1 answer

  • answered 2018-10-17 07:59 jezrael

    I believe you need change format to %d/%m/%Y and test missing values:

    m2 = pd.to_datetime(df['Date of birth'], format='%d/%m/%Y', errors='coerce').isnull()
    #or skip parameter format if performance is not important
    #m2 = pd.to_datetime(df['Date of birth'], errors='coerce').isnull()
    
    df = df[m2]
    print (df)
      Name Date of birth
    0  Bob           NaN
    2   Jo        pencil
    

    If want omit NaNs rows chain another boolean mask for test not missing values with bitwise AND (&):

    m1 = df['Date of birth'].notnull()
    m2 = pd.to_datetime(df['Date of birth'], format='%d/%m/%Y', errors='coerce').isnull()
    
    df = df[m1 & m2]
    print (df)
      Name Date of birth
    2   Jo        pencil