Inspired by the post How to create a sequence of sequences of numbers in R?.
Question:
I would like to make the following sequence in NumPy.
[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
I have tried the following:
- Non-generic and hard coding using
np.r_np.r_[1:6, 2:6, 3:6, 4:6, 5:6] # array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]) - Pure Python to generate the desired array.
n = 5 a = np.r_[1:n 1] [i for idx in range(a.shape[0]) for i in a[idx:]] # [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5] - Create a 2D array and take the upper triangle from it.
n = 5 a = np.r_[1:n 1] arr = np.tile(a, (n, 1)) print(arr) # [[1 2 3 4 5] # [1 2 3 4 5] # [1 2 3 4 5] # [1 2 3 4 5] # [1 2 3 4 5]] o = np.triu(arr).flatten() # array([1, 2, 3, 4, 5, # 0, 2, 3, 4, 5, # 0, 0, 3, 4, 5, # This is 1D array # 0, 0, 0, 4, 5, # 0, 0, 0, 0, 5]) out = o[o > 0] # array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.
CodePudding user response:
I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.
np.concatenate([np.arange(i, n 1) for i in range(1, n 1)])
Here is the full code:
import numpy as np
from time import time
n = 5000
t = time()
c = np.concatenate([np.arange(i, n 1) for i in range(1, n 1)])
print(time() - t)
# 0.039876699447631836
t = time()
a = np.r_[1:n 1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688
print(all(b == c))
# True
CodePudding user response:
A really plain Python (no numpy) way is:
n = 5
a = [r for start in range(1, n 1) for r in range(start, n 1)]
This will be faster for small n (~150) but slower than @tangolin's solution for larger n. It is still faster than the OP's "pure python" way.
A faster implementation prepares the data in advance, avoiding creating a new range each time :
source = np.arange(1, n 1)
d = np.concatenate([source[i: n 1] for i in range(0, n)])
NOTE
My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading @tangolin's answer and noticed that concatenate does the same.
Original implementation:
e = np.empty((n*(n 1)//2, ), dtype='int64')
source = np.arange(1, n 1)
for i in range(n):
init = n * i - i*(i-1)//2
end = n - i init
e[init:end] = source[i:n]
