Clean way to generate random numbers from 0 to 50 of size 1000 in python, with no similar number of-CodePudding

What would be the cleanest way to generate random numbers from 0 to 50, of size 1000, with the condition that no number should have the same number of occurrence as any other number using python and numpy.

Example for size 10: [0, 0, 0, 1, 1, 3, 3, 3, 3, 2] --> no number occurs same number of times

CodePudding user response：

Drawing from a rng.dirichlet distribution and rejecting samples guarantees to obey the requirements, but with low entropy for the number of unique elements. You have to adjust the range of unique elements yourself with np.ones(rng.integers(min,max)). If max approaches the maximum number of unique elements (here 50) rejection might take long or has no solution, causing an infinite loop. The code is for a resulting array of size of 100.

import numpy as np

times = np.array([])
rng = np.random.default_rng()

#rejection sampling
while times.sum() != 100 or len(times) != len(np.unique(times)): 
    times = np.around(rng.dirichlet(np.ones(rng.integers(5,10)))*100)

nr = rng.permutation(np.arange(51))[:len(times)]
np.repeat(nr, times.astype(int))

Random output

array([ 7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,
        7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7, 33, 33, 33,
       33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
       21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
       21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22,
       22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 25,  5,  5,  5])

CodePudding user response：

To avoid the variability of generating random partitions in a potentially long trial/error loop, you could use a function that directly produces a random partition of a number where all parts are distinct (increasing). from that you simply need to map shuffled numbers over the chunks provided by the partition function:

def randPart(N,size=0):                       # O(√N)
    if not size:
        maxSize = int((N*2 0.25)**0.5-0.5)    # ∑1..maxSize <= N
        size    = random.randrange(1,maxSize) # select random size
    if size == 1: return (N,)                 # one part --> all of N 
    s = size*(size-1)//2                      # min sum of deltas for rest 
    a = random.randrange(1,(N-s)//size)       # base value
    p = randPart(N-a*size,size-1)             # deltas on other parts 
    return (a,*(n a for n in p))              # combine to distinct parts

usage:

size = 30
n    = 10

chunks  = randPart(size)
numbers = random.sample(range(n),len(chunks))
result  = [n for count,n in zip(chunks,numbers) for _ in range(count)]

print(result)
[9, 9, 9, 0, 0, 0, 0, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6,
 6, 6, 6, 6, 6, 6, 6]

# resulting frequency counts
from collections import Counter
print(sorted(Counter(result).values()))
[3, 4, 5, 6, 12]

note that, if your range of random numbers is smaller than the maximum number of distinct partitions (for example fewer than 44 numbers for an output of 1000 values), you would need to modify the randPart function to take the limit into account in its calculation of maxSize:

def randPart(N,sizeLimit=0,size=0):
    if not size:
        maxSize = int((N*2 0.25)**0.5-0.5)    # ∑1..maxSize <= N
        maxSize = min(maxSize,sizeLimit or maxSize)
    ...

You could also change it to force a minimum number of partitions

CodePudding user response：

This solves your problem in the way @MYousefi suggested.

import random

seq = list(range(50))
random.shuffle(seq)
values = []
for n,v in enumerate(seq):
    values.extend( [v]*(n 1) )
    if len(values) > 1000:
        break
print(values)

Note that you can't get exactly 1,000 numbers. At first, I generated the entire sequence and then took the first 1,000, but that means whichever sequence gets truncated will be the same length as one of the earlier ones. You end up with 1,035.

CodePudding user response：

Here's a recursive and possibly very slow implementation that produces the output desired.

import numpy as np


def get_sequence_lengths(values, total):
    if total == 0:
        return [[]], True
    if total < 0:
        return [], False
    if len(values) == 0:
        return [], False
    sequences = []
    result = False
    for i in range(len(values)):
        ls, suc = get_sequence_lengths(values[:i]   values[i   1:], total - values[i])
        result |= suc
        if suc:
            sequences.extend([[values[i]]   s for s in ls])
    return sequences, result


def gen_numbers(rand_min, rand_max, count):
    values = list(range(rand_min, rand_max   1))
    sequences, success = get_sequence_lengths(list(range(1, count 1)), count)
    sequences = list(filter(lambda x: len(x) <= 1   rand_max - rand_min, sequences))
    if not success or not len(sequences):
        raise ValueError('Cannot generate with given parameters.')

    sequence = sequences[np.random.randint(len(sequences))]
    values = np.random.choice(values, len(sequence), replace=False)
    result = []
    for v, s in zip(values, sequence):
        result.extend([v] * s)
    return result

get_sequence_length will generate all permutations of unique positive integers that sum up to the given total. The sequence will then be further filtered by the number available values. Finally the generation of paired value and counts from the sequence produces the output.

As mentioned above get_sequence_length is recursive and is going to be quite slow for larger input values.