numpy array item from type object cannot be converted to numpy.datetime64

I am quite new to python and trying some features. I searched a lot in here for a possible solution with no success. This is my issue:

Goal:

  • read a column with date and time from an .xlsx file --> works

  • get this column in a numpy array --> works

  • The array item is read from the .xlsx as an Object

The array output looks like this:

['2020-11-09 20:30:59' '2020-11-08 09:22:54' '2020-11-04 02:24:17' ...
 '1900-01-27 02:30:00' '1900-01-24 03:00:00' '1900-01-18 15:30:00']
  • convert the date and time to a float --> does not work

I tried the following:

x[:, 0] = [np.datetime64(x[:, 0])]

--> this doesn't work; so I tried to convert to String before.

x[:, 0] = [np.datetime64(str(x[:, 0]))] 

--> This does also not work with this error message:

ValueError: Error parsing datetime string "['2020-11-09 20:30:59' '2020-11-08 09:22:54' '2020-11-04 02:24:17' ...
 '1900-01-27 02:30:00' '1900-01-24 03:00:00' '1900-01-18 15:30:00']" at position 0

So where is the issue?

Thanks a lot for the answer! It seems to work with the datetime64 conversion. I did change .

np.array(alist, 'datetime64[ms]')

to

np.array(alist, 'datetime64[ns]')

By that I get the datetime64 as an integer. Nevertheless, how comes that I get negative values for the early datesas seen in the output?

-----------------------------------------------------
--> Before conversion: ['2020-11-09 20:30:59' '2020-11-08 09:22:54' '2020-11-04 02:24:17' ...
 '1900-01-27 02:30:00' '1900-01-24 03:00:00' '1900-01-18 15:30:00']
-----------------------------------------------------
--> After conversion to dt64: [1604953859000000000 1604827374000000000 1604456657000000000 ...      
 -2206733400000000000 -2206990800000000000 -2207464200000000000]
-----------------------------------------------------

1 answer

  • answered 2021-06-19 20:05 hpaulj

    I was going to say that datatime64 can be picky when converting strings. It can't handle every kind of delimiter. But

    In [219]: alist = ['2020-11-09 20:30:59', '2020-11-08 09:22:54', '2020-11-04 02:24:17',
         ...:  '1900-01-27 02:30:00', '1900-01-24 03:00:00', '1900-01-18 15:30:00']
    In [220]: alist
    Out[220]: 
    ['2020-11-09 20:30:59',
     '2020-11-08 09:22:54',
     '2020-11-04 02:24:17',
     '1900-01-27 02:30:00',
     '1900-01-24 03:00:00',
     '1900-01-18 15:30:00']
    

    it handles this format just fine:

    In [221]: np.array(alist, 'datetime64[ms]')
    Out[221]: 
    array(['2020-11-09T20:30:59.000', '2020-11-08T09:22:54.000',
           '2020-11-04T02:24:17.000', '1900-01-27T02:30:00.000',
           '1900-01-24T03:00:00.000', '1900-01-18T15:30:00.000'],
          dtype='datetime64[ms]')
    

    This display has a 'T' between date and time, but its lack is not a problem.

    converted back to a list, it produces datatime objects:

    In [222]: _.tolist()
    Out[222]: 
    [datetime.datetime(2020, 11, 9, 20, 30, 59),
     datetime.datetime(2020, 11, 8, 9, 22, 54),
     datetime.datetime(2020, 11, 4, 2, 24, 17),
     datetime.datetime(1900, 1, 27, 2, 30),
     datetime.datetime(1900, 1, 24, 3, 0),
     datetime.datetime(1900, 1, 18, 15, 30)]
    

    Or just displaying seconds

    In [223]: np.array(alist, 'datetime64[s]')
    Out[223]: 
    array(['2020-11-09T20:30:59', '2020-11-08T09:22:54',
           '2020-11-04T02:24:17', '1900-01-27T02:30:00',
           '1900-01-24T03:00:00', '1900-01-18T15:30:00'],
          dtype='datetime64[s]')
    

    Parsing a single datatime string

    In [235]: alist[0]
    Out[235]: '2020-11-09 20:30:59'
    In [236]: np.datetime64(alist[0])
    Out[236]: numpy.datetime64('2020-11-09T20:30:59')
    

    If it's an object dtype array (such as one might get from a dataframe):

    In [254]: arr = np.array(alist, object)
    In [255]: arr
    Out[255]: 
    array(['2020-11-09 20:30:59', '2020-11-08 09:22:54',
           '2020-11-04 02:24:17', '1900-01-27 02:30:00',
           '1900-01-24 03:00:00', '1900-01-18 15:30:00'], dtype=object)
    In [256]: arr1 = arr.astype('datetime64[s]')
    In [257]: arr1
    Out[257]: 
    array(['2020-11-09T20:30:59', '2020-11-08T09:22:54',
           '2020-11-04T02:24:17', '1900-01-27T02:30:00',
           '1900-01-24T03:00:00', '1900-01-18T15:30:00'],
          dtype='datetime64[s]')