# need an algorithm for continuity check: select a list of integer number to have best "coverage"

this is an hardcore algorithm problem that i'm facing right now:

let's say there is a sorted list of integers L1:
1. the total length of the list is known which is N (e.g. N could be 1e7)
2. all the elements are between two known boundaries, A and B , ( A <<< B )
e.g. L1 = [ 2,5,10,15,18,19,21...]

Right now, I need to select a subset of the elements from the list L1 to form a new list L2 with the total length of M (M < N)
(e.g. M could equal N /10 )

to satisfy a condition: the new list L2 needs to have best "coverage";
by "coverage", it means that all the elements integers in the L2 need to be distributed in the L1's range, [A,B], as equally as possible. (a.k.a an unbiased sub-sampling method )

any help is deeply appreciated.

thanks for everyone's help and idea. I try to simplify the problem so that everyone (without the background knowledge can understand the problem). To define a rule for goodness of the coverage:

the ultimate goal is to achieve:

1. in the list L2, for any two neighbor element J and K, there are | J - K | , and the sum of this difference needs to be minimized

2. apply a given window with the total length of Q ( Q < M ) to list L2, and the number of elements within the window needs to be either equal (ideal situation) or almost equal

thanks

My idea is to utilize bucket sort of which bucket size is (A - B) / M. After mapping each element in l1 to its corresponding bucket, pick element randomly from each bucket to from the new list. If the new list is shorter than m, then I repeat the process. The following is my implementation in Python:

``````import bisect
import random
import collections

def form_new_list(l1, m, a, b, res):
if m <= 0:
return

bucket_size = int((b - a) / m)
bucket_list = collections.defaultdict(list)
for i, num in enumerate(l1):
bucket_num = int(num / bucket_size)
bucket_list[bucket_num].append(num)

for _, nums in bucket_list.items():
selected = random.choice(nums)
position = bisect.bisect(res, selected)
bisect.insort(res, selected)
l1.remove(selected)

form_new_list(l1, m - len(res), a, b, res)

return res
``````