# How to filter a numpy array of points by another array

how can I filter a numpy array a, by the elements of a numpy array b so that I get all the points in a that are not in b.

``````import numpy as np

a = np.array([[1,2],[1,3],[1,4]])
b = np.array([[1,2],[1,3]])
c = np.array([ d for d in a if d not in b])
print(c)

# acutall outcome
# []
# desired outcome
# np.array([[1,4]])```
``````

This probably will not be the most efficient (though it turns out to be faster than the other approaches presented here for this input -- see below), but one thing you can do is convert `a` and `b` to Python lists and then take their set difference:

``````# Method 1
tmp_1 = [tuple(i) for i in a]    # -> [(1, 2), (1, 3), (1, 4)]
tmp_2 = [tuple(i) for i in b]    # -> [(1, 2), (1, 3)]

c = np.array(list(set(tmp_1).difference(tmp_2)))
``````

As noted by @EmiOB, this post offers some insights into why `[ d for d in a if d not in b ]` in your question does not work. Drawing from that post, you can use

``````# Method 2
c = np.array([d for d in a if all(any(d != i) for i in b)])
``````

Remarks

The implementation of `array_contains(PyArrayObject *self, PyObject *el)` (in C) says that calling `array_contains(self, el)` (in C) is equivalent to

``````(self == el).any()
``````

in Python, where `self` is a pointer to an array and `el` is a pointer to a Python object.

In other words:

1. if `arr` is a numpy array and `obj` is some arbitrary Python object, then
``````obj in arr
``````

is the same as

``````(arr == obj).any()
``````
1. if `arr` is a typical Python container such as a list, tuple, dictionary, and so on, then
``````obj in arr
``````

is the same as

``````any(obj is _ or obj == _ for _ in arr)
``````

All of which is to say, the meaning of `obj in arr` is different depending on the type of `arr`.

This explains why the logical comprehension that you proposed `[d for d in a if d not in b]` does not have the desired effect.

This can be confusing because it is tempting to reason that since a numpy array is a sequence (though not a standard Python one), test membership semantics should be the same. This is not the case.

Example:

``````a = np.array([[1,2],[1,3],[1,4]])
print((a == [1,2]).any())          # same as [1, 2] in a
# outputs True
``````

Timings

For your input, I found my approach to be the fastest, followed by Method 2 obtained from the post @EmiOB suggested, followed by @DanielF's approach. I would not be surprised if changing the input size changes the ordering of the timings so take them with a grain of salt.

``````# Method 1
5.96 µs ± 8.92 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# Method 2
6.45 µs ± 27.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
16.5 µs ± 276 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
``````

Use This:

``````c = np.array([a_elem for a_elem in a if all(any(a_elem != b_elem) for b_elem in b)])
``````

Output:

``````array([[1, 4]])
``````

Explanation:

We loop for a sublist `a_elem` from `a` and check for all sublists from `b`. `any(a_elem != b_elem)` returns `True` if any value from `a_elem` is not equal to `b_elem`. `all(any(a_elem != b_elem) for b_elem in b)` returns True if all sublists are unequal.

Eg:

We take `[1,2]` from `a` check if any of its elements are unequal to `[1,2]`, `[1,3]` from `b` one by one. So, it'll be `False` for `[1,2]` and `True` for `[1,3]`. This creates a list `[False, True]`

Next, we take `[1,3]` from `a`. It'll return `True` for `[1,2]` and `False` for `[1,3]`. This creates another list `[True, False]`.

Lastly, we take `[1,4]` from `a`. It'll return `True` for both `[1,2]` and `[1,3]`. This creates a list `[True, True]`

Now, when we run `all()` it returns `True` when both values are `True` in the above lists. Hence, we add `[1,4]` to our array.

``````vview = lambda a:np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
This avoids the slow `for` loops of the other answers and doesn't create any intermediate data structures.