Faster way to patchify a picture to overlapping blocks

I'm looking for a FAST (and if possible memory afficiant) way to rewrite a function I crerated as part of Visual bag of words algorithm:

def get_pic_patches(pic, l, s):  # "s" stands for stride

    r, c = pic.shape
    i, j = [0, 0]
    x_range = list(range(0, r, s  ) )
    y_range = list(range(0, c ,  s ) )
    patches = []
    patches_location = []
    for x in x_range:  # without last two since it will exceed dimensions
        for y in y_range:  # without last two since it will exceed dimensions
            if x+ l<= r and y+l <= c:
                patch = pic[x:x +  l , y:y + l ]
                patches_location.append([x, y])  # patch location is the upper left pixel
                patches.append(  patch   )

    return patches, patches_location

it takes a grayscale image (NOT RGB!), desired patch length and stride value, and gives back all patches as a list of numpy array.

On other qestions, I found this:

def patchify(img, patch_shape):
    img = np.ascontiguousarray(img)  # won't make a copy if not needed
    X, Y = img.shape
    x, y = patch_shape
    shape = ((X-x+1), (Y-y+1), x, y) # number of patches, patch_shape 
    strides = img.itemsize*np.array([Y, 1, Y, 1])
    return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)

in order to get to return a list, I used it like this:

def patchify(img, patch_shape):
    img = np.ascontiguousarray(img)  # won't make a copy if not needed
    X, Y = img.shape
    x, y = patch_shape
    shape = ((X-x+1), (Y-y+1), x, y) # number of patches, patch_shape 
    strides = img.itemsize*np.array([Y, 1, Y, 1])
    patches = np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
    a,b,c,d = patches.shape
    patches = patches.reshape(((a*b),c,d))
    patches = patches.tolist()
    return 

but this was actually much slower than my original function! another problem is that is only works with stride = 1, and I want to be able to use all sorts of stride values.