Deleting a row from an array

I'm working on an assignment in which an array called lifeforms will be created with 4 columns called xvalues (x), yvalues (y) , colours (c) respectively and the fourth is used elsewhere in the original program.

I want that if the x and y values of two rows coincide, then based on their color code (c), one of them would be deleted from the main lifeforms array (a "0" c value beats "1", a "1" c value beats "2" and a "2" c value beats "0").

The original array looks like:

[[12 15  2  0]
 [65 23  0  0]
 [24 66  2  0]
 [65 23  1  0]
 [24 66  0  0]]

The problem is that when I try to run the following program I do not get the required array at the end. The expected output array would look like:

[[12 15  2  0]
 [65 23  0  0]
 [24 66  2  0]]

I have given an extract from the main program below (please note that the given array is just an example array. In the main program the array size could include up to 500+ rows, each column with random values! So please answer in accordance with that!)

import numpy as np

#Array
lifeforms = np.array([[12,15,2,0],[65,23,0,0],[24,66,2,0],[65,23,1,0],[24,66,0,0]])

#Original Array
print(lifeforms)

#Lists to store x, y and c values
xvalues = []
yvalues = []
colours = []

#Any removed row is added into this list
removed = []

#Code to delete a row
for l1 in lifeforms:
    for l2 in lifeforms:
        if l1[0] == l2[0]:
            if l2[1] == l2[1]:
                if l1[2] == 1 and l2[2] == 0:    
                    removed.append(l1)
                if l1[2] == 0 and l2[2] == 2:    
                    removed.append(l1)
                if l1[2] == 2 and l2[2] == 1:    
                    removed.append(l1)

for i in removed:
    lifeforms = np.delete(lifeforms,i,axis=0)

for l in lifeforms:                        
    xvalues.append(l[0])
    yvalues.append(l[1])
    colours.append(l[2])

#Update the original Array
for i in removed:
    print(removed)

print()
print("x\n", xvalues)
print("y\n", yvalues)
print("colours\n", colours)
print()
#Updated Array
print(lifeforms)

2 answers

  • answered 2020-10-23 12:41 David S

    If you can use pandas, you can do the following:

    x = np.array([[12,15,2,0],[65,23,0,1],[24,66,2,0],[65,23,1,0],[24,66,0,0]])
    df = pd.DataFrame(x)
    new_df = df.iloc[df.loc[:,(0,1)].drop_duplicates().index]
    print(new_df)
    
        0   1  2  3
    0  12  15  2  0
    1  65  23  0  1
    2  24  66  2  0
    

    What it does is the following:

    1. transform the array to pandas data-frame
    2. df.loc[:,(0,1)].drop_duplicates().index will return the indices of the rows you wish to keep (based on the first and second columns)
    3. df.iloc will return the sliced data-frame.

    Edit based on OP questions in the comments and @wwii remarks:

    1. you can return to numpy array using .to_numpy(), so just do arr = new_df.to_numpy()

    2. You can try the following:

      xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])
      df = pd.DataFrame(xx)
      df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2].idxmin()])
      df_new.reset_index(drop=True, inplace=True)
      
          0   1  2  3
      0  12  15  2  0
      1  24  66  0  0
      2  65  23  0  0
      

    When there is a special heuristic to consider one can do the following:

    import pandas as pd
    import numpy as np
    
    def f_(x):
        vals = x[2].tolist()
        if len(vals)==2:
            # print(vals)
            if vals[0] == 0 and vals[1] == 1:
                return vals[0]
            elif vals[0] == 1 and vals[1] == 0:
                return vals[1]
            elif vals[0] == 1 and vals[1] == 2:
                return vals[0]
            elif vals[0] == 2 and vals[1] == 0:
                return vals[0]
        elif len(vals) > 2:
            return -1
        else:
            return x[2]
    
    xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])
    df = pd.DataFrame(xx)
    df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2] == f_(x)])
    df_new.reset_index(drop=True, inplace=True)
    print(df_new)
    
        0   1  2  3
    0  12  15  2  0
    1  24  66  2  0
    2  65  23  0  0
    

  • answered 2020-10-23 15:04 wwii

    Test array

    a = lifeforms = np.array([[12,15,2,0],
                              [13,13,0,0],
                              [13,13,1,0],
                              [13,13,2,0],
                              [65,23,1,0],
                              [24,66,2,0],
                              [14,14,1,0],
                              [14,14,1,1],
                              [14,14,1,2],
                              [14,14,2,0],
                              [15,15,3,2],
                              [15,15,2,0],
                              [65,23,0,0],
                              [24,66,0,0]])
    

    Function that implements color selection.

    test_one = np.array([[0,1],[1,0],[1,2],[2,1]])
    test_two = np.array([[0,2],[2,0]])
    
    def f(g):
        a = g.loc[:,2].unique()
        if np.any(np.all(a == test_one, axis=1)):
            idx = (g[2] == g[2].min()).idxmax()
        elif np.any(np.all(a == test_two, axis=1)):
            idx = (g[2] == g[2].max()).idxmax()
        else:
            raise ValueError('group colors outside bounds')
        return idx
    

    Groupby first two columns; iterate over groups; save indices of desired rows; use those indices to select rows from the DataFrame.

    df = pd.DataFrame(a)
    gb = df.groupby([0,1])
    
    indices = []
    for k,g in gb:
        if g.loc[:,2].unique().shape[0] > 2:
            #print(f'(0,1,2) - dropping indices {g.index}')
            continue
        if g.shape[0] == 1:
            indices.extend(g.index.to_list())
            #print(f'unique - keeping index {g.index.values}')
            continue
        #print(g.loc[:,2])
        try:
            idx = f(g)
        except ValueError as e:
            print(sep)
            print(e)
            print(g)
            print(sep)
            continue 
        #print(f'keeping index {idx}')
        indices.append(idx)
        #print(sep)
    
    print(df.loc[indices,:])