Populating pandas dataframe efficiently using a 2-D numpy array

I have a 2-D numpy array each row of which consists of three elements - ['dataframe_column_name', 'dataframe_index', 'value']. Now, I tried populating the pandas dataframe using iloc double for loop but it is quite slow. Is there any faster way of doing this. I am a bit new to pandas, so apologies in case this is something very basic. Here is the code snippet :

my_nparray = [['a', 1, 123], ['b', 1, 230], ['a', 2, 321]]
for r in range(my_nparray.shape[0]):
    [col, ind, value] = my_nparray[r]
    df.iloc[col][ind] = value

This takes a lot of time when my_nparray is large, is there any other way of doing this?

Initially assume that I can create this data frame :

  'a' 'b'
1 NaN NaN
2 NaN NaN

I want the output as :

  'a' 'b'
1 123 230
2 321 NaN

2 answers

  • answered 2019-02-10 14:11 Alex

    You can use from_records and then pivot:

    df = pd.DataFrame.from_records(my_nparray, index=1).pivot(columns=0)
    
           2
    0      a      b
    1
    1  123.0  230.0
    2  321.0    NaN
    

    This specifies that the index uses field 1 from your array and pivot uses Series 0 for the columns.

    Then we can reset the MultiIndex on the columns and the index:

    df.columns = df.columns.droplevel(None)
    df.columns.name = None
    df.index.name = None
    
           a      b
    1  123.0  230.0
    2  321.0    NaN
    

  • answered 2019-02-10 14:23 jezrael

    Use DataFrame constructor with DataFrame.pivot and DataFrame.rename_axis:

    df = pd.DataFrame(my_nparray).pivot(1,0,2).rename_axis(index=None, columns=None)
    print (df)
           a      b
    1  123.0  230.0
    2  321.0    NaN