# Number of elements of numpy arrays inside specific bins

I have an ensemble of sorted (one-dimensional) arrays of unequal lengths (say `M0`, `M1` and `M2`). I want to find how many elements of each of these arrays is inside specific number ranges (where the number ranges are specified by neighboring elements from another sorted array, say `zbin`). I want to know what is the fastest way to achieve this.

Here, I am giving a small example of the task that I want to do (and also the method that I am following presently to achieve the desired functionality):

``````""" Function to do search query """
def search(numrange, lst):
arr = np.zeros(len(lst))
for i in range(len(lst)):
probe = lst[i]
count = 0
for j in range(len(probe)):
if (probe[j]>numrange[1]): break
if (probe[j]>=numrange[0]) and (probe[j]<=numrange[1]): count = count + 1

arr[i] = count
return arr

""" Some example of sorted one-dimensional arrays of unequal lengths """
M0 = np.array([5.1, 5.4, 6.4, 6.8, 7.9])
M1 = np.array([5.2, 5.7, 8.8, 8.9, 9.1, 9.2])
M2 = np.array([6.1, 6.2, 6.5, 7.2])

""" Implementation and output """
lst = [M0, M1, M2]
zbin = np.array([5.0, 5.5, 6.0, 6.5])
zarr = np.zeros( (len(zbin)-1, len(lst)) )
for i in range(len(zbin)-1):
numrange = [zbin[i], zbin[i+1]]
zarr[i,:] = search(numrange, lst)

print zarr
``````

Output:

``````[[ 2.  1.  0.]
[ 0.  1.  0.]
[ 1.  0.  3.]]
``````

Here, the final output `zarr` gives me the number of elements of each of the arrays (`M0`, `M1` and `M2`) inside each of the bins possible from `zbin` (viz. `[5.0, 5.5]`, `[5.5, 6.0]` and `[6.0, 6.5]`.) For example consider the bin `[5.0, 5.5]`. The array `M0` has 2 elements inside that bin (`5.1` and `5.4`), `M1` has 1 element (`5.2`) and `M2` has 0 elements in that bin. This gives the first row of `zarr` i.e. `[2,1,0]`. One can get the other rows of `zarr` in a similar manner.

In my actual task, I will be dealing with `zbin` of lengths much larger than what I have given in this example, and also bigger and many more arrays like `M0`, `M1`, `...` `Mn`. All `M`s and the array `zbin` would be sorted always. I am wondering if the function that I have designed (`search()`), and the method that I am following are the most optimum and the fastest ways to achieve the desired functionality. I will really appreciate any help.

I would guess it would be difficult to beat the performance of simply looping over each array and calling numpy.histogram. I’m guessing you haven’t tried this or you’d have mentioned it!

It’s certainly possible that you could exploit the sorted nature to come up with a faster solution, but I’d start by comparing the timing of that.

We could make use of the sorted nature and hence use `np.searchsorted` for this task, like so -
``````out = np.empty((len(zbin)-1, len(lst)),dtype=int)