Deleting a row from an array
I'm working on an assignment in which an array called lifeforms
will be created with 4 columns called xvalues
(x), yvalues
(y) , colours
(c) respectively and the fourth is used elsewhere in the original program.
I want that if the x and y values of two rows coincide, then based on their color code (c), one of them would be deleted from the main lifeforms
array (a "0" c value beats "1", a "1" c value beats "2" and a "2" c value beats "0").
The original array looks like:
[[12 15 2 0]
[65 23 0 0]
[24 66 2 0]
[65 23 1 0]
[24 66 0 0]]
The problem is that when I try to run the following program I do not get the required array at the end. The expected output array would look like:
[[12 15 2 0]
[65 23 0 0]
[24 66 2 0]]
I have given an extract from the main program below (please note that the given array is just an example array. In the main program the array size could include up to 500+ rows, each column with random values! So please answer in accordance with that!)
import numpy as np
#Array
lifeforms = np.array([[12,15,2,0],[65,23,0,0],[24,66,2,0],[65,23,1,0],[24,66,0,0]])
#Original Array
print(lifeforms)
#Lists to store x, y and c values
xvalues = []
yvalues = []
colours = []
#Any removed row is added into this list
removed = []
#Code to delete a row
for l1 in lifeforms:
for l2 in lifeforms:
if l1[0] == l2[0]:
if l2[1] == l2[1]:
if l1[2] == 1 and l2[2] == 0:
removed.append(l1)
if l1[2] == 0 and l2[2] == 2:
removed.append(l1)
if l1[2] == 2 and l2[2] == 1:
removed.append(l1)
for i in removed:
lifeforms = np.delete(lifeforms,i,axis=0)
for l in lifeforms:
xvalues.append(l[0])
yvalues.append(l[1])
colours.append(l[2])
#Update the original Array
for i in removed:
print(removed)
print()
print("x\n", xvalues)
print("y\n", yvalues)
print("colours\n", colours)
print()
#Updated Array
print(lifeforms)
2 answers

If you can use pandas, you can do the following:
x = np.array([[12,15,2,0],[65,23,0,1],[24,66,2,0],[65,23,1,0],[24,66,0,0]]) df = pd.DataFrame(x) new_df = df.iloc[df.loc[:,(0,1)].drop_duplicates().index] print(new_df) 0 1 2 3 0 12 15 2 0 1 65 23 0 1 2 24 66 2 0
What it does is the following:
 transform the array to pandas dataframe
df.loc[:,(0,1)].drop_duplicates().index
will return the indices of the rows you wish to keep (based on the first and second columns)df.iloc
will return the sliced dataframe.
Edit based on OP questions in the comments and @wwii remarks:
you can return to numpy array using
.to_numpy()
, so just doarr = new_df.to_numpy()
You can try the following:
xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]]) df = pd.DataFrame(xx) df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2].idxmin()]) df_new.reset_index(drop=True, inplace=True) 0 1 2 3 0 12 15 2 0 1 24 66 0 0 2 65 23 0 0
When there is a special heuristic to consider one can do the following:
import pandas as pd import numpy as np def f_(x): vals = x[2].tolist() if len(vals)==2: # print(vals) if vals[0] == 0 and vals[1] == 1: return vals[0] elif vals[0] == 1 and vals[1] == 0: return vals[1] elif vals[0] == 1 and vals[1] == 2: return vals[0] elif vals[0] == 2 and vals[1] == 0: return vals[0] elif len(vals) > 2: return 1 else: return x[2] xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]]) df = pd.DataFrame(xx) df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2] == f_(x)]) df_new.reset_index(drop=True, inplace=True) print(df_new) 0 1 2 3 0 12 15 2 0 1 24 66 2 0 2 65 23 0 0

Test array
a = lifeforms = np.array([[12,15,2,0], [13,13,0,0], [13,13,1,0], [13,13,2,0], [65,23,1,0], [24,66,2,0], [14,14,1,0], [14,14,1,1], [14,14,1,2], [14,14,2,0], [15,15,3,2], [15,15,2,0], [65,23,0,0], [24,66,0,0]])
Function that implements color selection.
test_one = np.array([[0,1],[1,0],[1,2],[2,1]]) test_two = np.array([[0,2],[2,0]]) def f(g): a = g.loc[:,2].unique() if np.any(np.all(a == test_one, axis=1)): idx = (g[2] == g[2].min()).idxmax() elif np.any(np.all(a == test_two, axis=1)): idx = (g[2] == g[2].max()).idxmax() else: raise ValueError('group colors outside bounds') return idx
Groupby first two columns; iterate over groups; save indices of desired rows; use those indices to select rows from the DataFrame.
df = pd.DataFrame(a) gb = df.groupby([0,1]) indices = [] for k,g in gb: if g.loc[:,2].unique().shape[0] > 2: #print(f'(0,1,2)  dropping indices {g.index}') continue if g.shape[0] == 1: indices.extend(g.index.to_list()) #print(f'unique  keeping index {g.index.values}') continue #print(g.loc[:,2]) try: idx = f(g) except ValueError as e: print(sep) print(e) print(g) print(sep) continue #print(f'keeping index {idx}') indices.append(idx) #print(sep) print(df.loc[indices,:])