How to create a sequence of sequences of numbers in NumPy?-CodePudding

^{Inspired by the post How to create a sequence of sequences of numbers in R?.}

Question:

I would like to make the following sequence in NumPy.

[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]

I have tried the following:

Non-generic and hard coding using np.r_

np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])

Pure Python to generate the desired array.

n = 5
a = np.r_[1:n 1]
[i for idx in range(a.shape[0]) for i in a[idx:]]
# [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]

Create a 2D array and take the upper triangle from it.

n = 5
a = np.r_[1:n 1]
arr = np.tile(a, (n, 1))
print(arr)
# [[1 2 3 4 5]
#  [1 2 3 4 5]
#  [1 2 3 4 5]
#  [1 2 3 4 5]
#  [1 2 3 4 5]]

o = np.triu(arr).flatten()
# array([1, 2, 3, 4, 5, 
#        0, 2, 3, 4, 5, 
#        0, 0, 3, 4, 5, # This is 1D array
#        0, 0, 0, 4, 5, 
#        0, 0, 0, 0, 5])

out = o[o > 0]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])

The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.

CodePudding user response：

I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.

np.concatenate([np.arange(i, n 1) for i in range(1, n 1)])

Here is the full code:

import numpy as np
from time import time

n = 5000

t = time()
c = np.concatenate([np.arange(i, n 1) for i in range(1, n 1)])
print(time() - t)
# 0.039876699447631836

t = time()
a = np.r_[1:n 1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688

print(all(b == c))
# True

CodePudding user response：

A really plain Python (no numpy) way is:

n = 5
a = [r for start in range(1, n 1) for r in range(start, n 1)]

This will be faster for small n (~150) but slower than @tangolin's solution for larger n. It is still faster than the OP's "pure python" way.

A faster implementation prepares the data in advance, avoiding creating a new range each time :

source = np.arange(1, n 1)
d = np.concatenate([source[i: n 1] for i in range(0, n)])

NOTE

My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading @tangolin's answer and noticed that concatenate does the same.

Original implementation:

e = np.empty((n*(n 1)//2, ), dtype='int64')
source = np.arange(1, n 1)
for i in range(n):
    init = n * i - i*(i-1)//2
    end = n - i   init
    e[init:end] = source[i:n]