Why is "print (*sum (_2DArray, []))" slower than "print(' '.join(j for i in-CodePudding

Today, while I was practicing Problem solving on HackerRank The Full Counting Sort Problem, I found out that flattening a 2D array, unpacking it and printing as below:

print (*sum (_2DArray, []))

was causing a "Time Limit Exceeded" Error on submission but using regular nested loops was fine.

print(' '.join(j for i in _2DArray for j in i))

Why is this flattening and unpacking slower than O(n^2) of nested loops ?

Thanks in advance

Edit:- Full Solution for problem

def countSort(arr):

result = [[] for x in range(len(arr))]    
for i in range(len(arr)):
    result [int (arr[i][0])].append ('-' if i < len(arr)//2 else arr[i][1]) 
    
print(' '.join(j for i in result for j in i))     
# print (*sum (result, []))

CodePudding user response：

sum (_2DArray, []) is essentially equivalent to

l = []
for item in _2DArray:
    l = l   item

The computational cost of concatenating two lists of sizes n1 and n2 is O(n1 n2)

Let us assume that our 2D array is a square matrix of size n * n. We can denote the size of l at each iteration to be n1 and the size of item to be n2. The number of steps of each loop can be determined as follows:

t = 1, n1 = 0, n2 = n ; The number of steps required to perform the concatenation is n
t = 2, n1 = n, n2 = n; The number of steps is n n = 2n
t = 3, n1 = 2n, n2 = n; The number of steps is 2n n = 3n

...

t = n, n1 = (n - 1)*n, n2 = n; The number of steps is n(n - 1) n = n^2

The effective number of steps required to perform concatenation using sum is n 2n 3n ... n^2 = n * n * (n 1) / 2 which yields a time complexity of O(n^3).

The concatenation approach is effectively slower than quadratic time. We also do not factor in the overhead for creation of n new lists which take at least O(n) and at most O(n^2) space. There's also the overhead of the print method having to treat each element of the array as an argument and iterate to print it.

Using ' '.join(j for i in _2DArray for j in i) essentially creates the resulting string in O(n^2) time by using an iterator over the list elements and also yields a single argument to pass to print.

CodePudding user response：

If you denote n the number of atoms of your array, printing them with a loop is O(n), whereas flattening them is in O(n²). This is because adding Python lists is linear in the size of both lists.

Besides that, when you flatten the list, you actually store the intermediate results in the RAM, whereas the loop way is lazy-evaluated, meaning you only have O(1) memory consumption. Writing to memory is a slow operation, which might also contribute to the poor performances.

EDIT: Some details (that are lacking in my original answer).

When you print something, that something has to be stored somewhere (at least in a buffer). So when I talk about memory complexity, I do not take into account that aspect. To put it another way, the complexity I am talking about is the extra space required.
With atom, I meant the bottom elements. [["a", "b"], ["c", "d"]] has four atoms. Flattening a list of lists using sum has a complexity that is quadratic with respect to the number of atoms (in the case of a list of singletons, that high bound is reached). However, simply traversing the array has a complexity that is linear with respect to the number of atoms.
Don't let the join fool you into thinking you're doing something more than if you did not put it. Your last list is equivalent to

print(*(j for i in _2DArray for j in i))

Note that this last line really shows that the difference between the two ways to print the elements of the array is in the non-lazy flattening vs lazy traversal.

A note about merging python lists.

The theoretical problem

When you add two lists in python, you actually move (at least) on of the two lists, which simply means copying all of its elements. This operation requires writing to memory. The main problem of flattening (done this way) is that you keep copying some elements again and again. There is a way to make it linear, which is the following

result = []
for line in _2DArray:
  result.extend(line)

however, since it Python you have a very thin control over how memory is managed, you can't be sure (unless you deep into the specs, which you usually want to avoid) that it's how it is done. The other way that it could be done would be

result = []
for line in _2DArray:
  temp = result
  result = line[:]
  result.extend(temp)

This is clearly much worse. Well, when you simply sum a list of lists, you can't really tell which one is going to happen.

The practical problem

In any case, even if what was actually done was the linear-time solution, you still copy arrays a lot, which implies you write to memory more than if you simply did it with generators.

CodePudding user response：

The way you wrote it, in the worst case your 2D list has a million strings in the first row and then 999,999 empty rows. The sum then copies the million string references in every addition, i e., a million times. That's a trillion string reference copies. And all the time, the strings' reference counters get increased and decreased. A looooot of work. Your other method doesn't have that issue. In short: They're O(n²) vs O(n), where n, as defined in the problem, can be as large as a million.

And as discussed, the outer length never needs to be n but at most 100 (more precisely the maximum x 1), and with that, the sum method gets accepted. Still does up to 100 million reference copies, but such copying is low-level and relatively fast compared to Python code.