Generating a list of random numbers, using custom bounds and summing to a desired value
I want to do practically something very similar as described in this answer. I want to create a list of random numbers that sum up to a given target value. If I would not care about the bounds, I could use what the answer suggests:
>>> print np.random.dirichlet(np.ones(10),size=1) [[ 0.01779975 0.14165316 0.01029262 0.168136 0.03061161 0.09046587 0.19987289 0.13398581 0.03119906 0.17598322]]
However, I want to be able to control the ranges and the target of the individual parameters. I want to provide the bounds of each parameter. For instance, I would pass a list of three tuples, with each tuple specifying the lower and upper boundary of the uniform distribution. The
target keyword argument would describe what the sum should add up to.
get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8)], target=0.9)
The output could for example look like this:
[0.2, 0.2, 0.5]
How could that be achieved?
from random import uniform while( True ): a = uniform(0.0 ,1.0) b = uniform(0.2 , 0.5) c = 0.9 - a - b if(c > 0.3 and c <0.8): break print(a,b,c)
Just find two randoms first. Subtract from the bounds to get the third 'random number'. Check to make sure if it satisfy the boundary conditions.
Ok, here is some idea/code to play with.
We will sample from Dirichlet, so sum objective is automatically fulfilled.
Then for each xi sampled from Dirichlet we apply linear transformation with different lower boundary li but the same scaling parameter
vi = li + s*xi
From summation objective (Si means summation over
i) and fact, that Dirichlet sampled values are always summed to 1
Si vi = target
we could compute
s = target - Si li
Let's put mean value of each vi right into middle of the interval.
E[vi] = li + s*E[xi] = (li + hi) / 2
E[xi] = (hi - li) / 2 / s
And let's introduce knob which is basically proportional to inverse variance of Dirichlets, so bigger is knob, tighter are sampled random values around mean.
So for Dirichlet distribution alpha parameters array
alphai = E[xi] * vscale
where vscale is user-defined variance scale factor. We will check if sampled value violate lower or upper boundary conditions and reject sampling if they do.
Code, Python 3.6, Anaconda 5.2
import numpy as np boundaries = np.array([[0.0, 1.0], [0.2, 0.5], [0.3, 0.8]]) target = 0.9 def get_rnd_numbers(boundaries, target, vscale): lo = boundaries[:, 0] hi = boundaries[:, 1] s = target - np.sum(lo) alpha_i = ( 0.5 * (hi-lo) / s ) * vscale print(np.sum(alpha_i)) x_i = np.random.dirichlet(alpha_i, size=1) v_i = lo + s*x_i good_lo = not np.any(v_i < lo) good_hi = not np.any(v_i > hi) return (good_lo, good_hi, v_i) vscale = 3.0 gl, gh, v = get_rnd_numbers(boundaries, target, vscale) print((gl, gh, v, np.sum(v))) if gl and gh: print("Good sample, use it") gl, gh, v = get_rnd_numbers(boundaries, target, vscale) print((gl, gh, v, np.sum(v))) if gl and gh: print("Good sample, use it") gl, gh, v = get_rnd_numbers(boundaries, target, vscale) print((gl, gh, v, np.sum(v))) if gl and gh: print("Good sample, use it")
You could play with different transformation ideas, maybe remove or replace mean condition to something more sensible. I would advice to keep idea of the knob, so you could tighten your sampling spread.