Number of elements of numpy arrays inside specific bins
I have an ensemble of sorted (onedimensional) arrays of unequal lengths (say M0
, M1
and M2
). I want to find how many elements of each of these arrays is inside specific number ranges (where the number ranges are specified by neighboring elements from another sorted array, say zbin
). I want to know what is the fastest way to achieve this.
Here, I am giving a small example of the task that I want to do (and also the method that I am following presently to achieve the desired functionality):
""" Function to do search query """
def search(numrange, lst):
arr = np.zeros(len(lst))
for i in range(len(lst)):
probe = lst[i]
count = 0
for j in range(len(probe)):
if (probe[j]>numrange[1]): break
if (probe[j]>=numrange[0]) and (probe[j]<=numrange[1]): count = count + 1
arr[i] = count
return arr
""" Some example of sorted onedimensional arrays of unequal lengths """
M0 = np.array([5.1, 5.4, 6.4, 6.8, 7.9])
M1 = np.array([5.2, 5.7, 8.8, 8.9, 9.1, 9.2])
M2 = np.array([6.1, 6.2, 6.5, 7.2])
""" Implementation and output """
lst = [M0, M1, M2]
zbin = np.array([5.0, 5.5, 6.0, 6.5])
zarr = np.zeros( (len(zbin)1, len(lst)) )
for i in range(len(zbin)1):
numrange = [zbin[i], zbin[i+1]]
zarr[i,:] = search(numrange, lst)
print zarr
Output:
[[ 2. 1. 0.]
[ 0. 1. 0.]
[ 1. 0. 3.]]
Here, the final output zarr
gives me the number of elements of each of the arrays (M0
, M1
and M2
) inside each of the bins possible from zbin
(viz. [5.0, 5.5]
, [5.5, 6.0]
and [6.0, 6.5]
.) For example consider the bin [5.0, 5.5]
. The array M0
has 2 elements inside that bin (5.1
and 5.4
), M1
has 1 element (5.2
) and M2
has 0 elements in that bin. This gives the first row of zarr
i.e. [2,1,0]
. One can get the other rows of zarr
in a similar manner.
In my actual task, I will be dealing with zbin
of lengths much larger than what I have given in this example, and also bigger and many more arrays like M0
, M1
, ...
Mn
. All M
s and the array zbin
would be sorted always. I am wondering if the function that I have designed (search()
), and the method that I am following are the most optimum and the fastest ways to achieve the desired functionality. I will really appreciate any help.
2 answers

I would guess it would be difficult to beat the performance of simply looping over each array and calling numpy.histogram. I’m guessing you haven’t tried this or you’d have mentioned it!
It’s certainly possible that you could exploit the sorted nature to come up with a faster solution, but I’d start by comparing the timing of that.

We could make use of the sorted nature and hence use
np.searchsorted
for this task, like so out = np.empty((len(zbin)1, len(lst)),dtype=int) for i,l in enumerate(lst): left_idx = np.searchsorted(l, zbin[:1], 'left') right_idx = np.searchsorted(l, zbin[1:], 'right') out[:,i] = right_idx  left_idx