Home > Net >  How to create a sequence of sequences of numbers in NumPy?
How to create a sequence of sequences of numbers in NumPy?

Time:01-06

Inspired by the post How to create a sequence of sequences of numbers in R?.


Question:

I would like to make the following sequence in NumPy.

[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]

I have tried the following:

  • Non-generic and hard coding using np.r_
    np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
    # array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
    
  • Pure Python to generate the desired array.
    n = 5
    a = np.r_[1:n 1]
    [i for idx in range(a.shape[0]) for i in a[idx:]]
    # [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
    
  • Create a 2D array and take the upper triangle from it.
    n = 5
    a = np.r_[1:n 1]
    arr = np.tile(a, (n, 1))
    print(arr)
    # [[1 2 3 4 5]
    #  [1 2 3 4 5]
    #  [1 2 3 4 5]
    #  [1 2 3 4 5]
    #  [1 2 3 4 5]]
    
    o = np.triu(arr).flatten()
    # array([1, 2, 3, 4, 5, 
    #        0, 2, 3, 4, 5, 
    #        0, 0, 3, 4, 5, # This is 1D array
    #        0, 0, 0, 4, 5, 
    #        0, 0, 0, 0, 5])
    
    out = o[o > 0]
    # array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
    

The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.

CodePudding user response:

I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.

np.concatenate([np.arange(i, n 1) for i in range(1, n 1)])

Here is the full code:

import numpy as np
from time import time

n = 5000

t = time()
c = np.concatenate([np.arange(i, n 1) for i in range(1, n 1)])
print(time() - t)
# 0.039876699447631836

t = time()
a = np.r_[1:n 1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688

print(all(b == c))
# True

CodePudding user response:

A really plain Python (no numpy) way is:

n = 5
a = [r for start in range(1, n 1) for r in range(start, n 1)]

This will be faster for small n (~150) but slower than @tangolin's solution for larger n. It is still faster than the OP's "pure python" way.

A faster implementation prepares the data in advance, avoiding creating a new range each time :

source = np.arange(1, n 1)
d = np.concatenate([source[i: n 1] for i in range(0, n)])

NOTE

My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading @tangolin's answer and noticed that concatenate does the same.

Original implementation:

e = np.empty((n*(n 1)//2, ), dtype='int64')
source = np.arange(1, n 1)
for i in range(n):
    init = n * i - i*(i-1)//2
    end = n - i   init
    e[init:end] = source[i:n]
  •  Tags:  
  • Related