Would like to vectorize while loop for performance (updated)

  1. Set values for a window of size n of an array based on the current value of another array
  2. Ignore values that the window overrides
  3. Need to be able to change the window size (n) for different runs

This code works but it is very slow.

    n = 3

    def signal(arr):
        signal = pd.Series(data=0, index=arr.index)
        i = 0
        while i < len(arr) - 1: 
            s = arr.iloc[i]
            if s in [-1, 1]:
                j = i + n
                signal.iloc[i: j] = s
                i = i + n
            else:
                i += 1
        return signal
arr = [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0]

signal = [0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, -1, -1, -1, 0, 0, 0]

1 answer

  • answered 2019-09-10 08:53 user2874583

    Don't make arr a pandas series object but just a numpy array. Try this:

    import numpy as np
    def signal(arr, n):
        size = len(arr)
        signal = np.zeros(size)
        for i in range(size):
            s = arr[i]
            if s in [-1, 1]:
                j = i + n
                signal[i: j] = s
                i = i + n
            else:
                i += 1
        return signal
    arr = [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0]
    n = 3
    
    signal(arr, n)
    

    I benchmarked the two different solutions and this is way faster:

    • Original: 738 µs ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    • New: 9.56 µs ± 778 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)