Generating a list of random numbers, using custom bounds and summing to a desired value

I want to do practically something very similar as described in this answer. I want to create a list of random numbers that sum up to a given target value. If I would not care about the bounds, I could use what the answer suggests:

>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975  0.14165316  0.01029262  0.168136  0.03061161  0.09046587  0.19987289  0.13398581  0.03119906 0.17598322]]

However, I want to be able to control the ranges and the target of the individual parameters. I want to provide the bounds of each parameter. For instance, I would pass a list of three tuples, with each tuple specifying the lower and upper boundary of the uniform distribution. The target keyword argument would describe what the sum should add up to.

get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8)], target=0.9)

The output could for example look like this:

[0.2, 0.2, 0.5]

How could that be achieved?

Update:

  1. Normalising, i.e. dividing by the sum of all random numbers, is not acceptable as it would distort the distribution.
  2. The solution should work with an arbitrary number of parameters / tuples.
  3. As was mentioned in the comment, this question is actually very similar but in another programming language.

2 answers

  • answered 2018-07-20 19:27 Bayko

    from random import uniform
    
    while( True ):
        a = uniform(0.0 ,1.0)
        b = uniform(0.2 , 0.5)
        c = 0.9 - a - b
        if(c > 0.3 and c <0.8):
            break
    
    print(a,b,c)
    

    Just find two randoms first. Subtract from the bounds to get the third 'random number'. Check to make sure if it satisfy the boundary conditions.

  • answered 2018-07-20 21:35 Severin Pappadeux

    Ok, here is some idea/code to play with.

    We will sample from Dirichlet, so sum objective is automatically fulfilled.

    Then for each xi sampled from Dirichlet we apply linear transformation with different lower boundary li but the same scaling parameter s.

    vi = li + s*xi

    From summation objective (Si means summation over i) and fact, that Dirichlet sampled values are always summed to 1

    Si vi = target

    we could compute s:

    s = target - Si li

    Let's put mean value of each vi right into middle of the interval.

    E[vi] = li + s*E[xi] = (li + hi) / 2

    E[xi] = (hi - li) / 2 / s

    And let's introduce knob which is basically proportional to inverse variance of Dirichlets, so bigger is knob, tighter are sampled random values around mean.

    So for Dirichlet distribution alpha parameters array

    alphai = E[xi] * vscale

    where vscale is user-defined variance scale factor. We will check if sampled value violate lower or upper boundary conditions and reject sampling if they do.

    Code, Python 3.6, Anaconda 5.2

    import numpy as np
    
    boundaries = np.array([[0.0, 1.0], [0.2, 0.5], [0.3, 0.8]])
    target = 0.9
    
    def get_rnd_numbers(boundaries, target, vscale):
        lo = boundaries[:, 0]
        hi = boundaries[:, 1]
        s = target - np.sum(lo)
        alpha_i = ( 0.5 * (hi-lo) / s ) * vscale
        print(np.sum(alpha_i))
    
        x_i = np.random.dirichlet(alpha_i, size=1)
        v_i = lo + s*x_i
    
        good_lo = not np.any(v_i < lo)        
        good_hi = not np.any(v_i > hi)
    
        return (good_lo, good_hi, v_i)
    
    vscale = 3.0
    gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
    print((gl, gh, v, np.sum(v)))
    if gl and gh:
        print("Good sample, use it")
    
    gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
    print((gl, gh, v, np.sum(v)))
    if gl and gh:
        print("Good sample, use it")
    
    gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
    print((gl, gh, v, np.sum(v)))
    if gl and gh:
        print("Good sample, use it")
    

    You could play with different transformation ideas, maybe remove or replace mean condition to something more sensible. I would advice to keep idea of the knob, so you could tighten your sampling spread.