Optimizing function in Python to handle a big chunk od data-CodePudding

I tried solving a problem on HackerRank, called 'Apple and orange', here's the code:

def countApplesAndOranges(s, t, a, b, apples, oranges):
    count_apples = 0
    count_oranges = 0
    x = [x for x in range(s, t 1)]
    pos_apple = [apple   a for apple in apples]
    pos_orange = [orange   b for orange in oranges]
    for i in x:
        for j in pos_apple:
            if j == i:
                count_apples  =1
        for l in pos_orange:
            if l == i:
                count_oranges  = 1
    print(count_apples)
    print(count_oranges)

The code works. However, when I try submitting it, it passes the first 3 tests and fails the rest with the exception 'Terminated due to timeout'. I checked out the input from one of the tests and it's a huge amount of data to process, you can take a look at the data here: https://hr-testcases-us-east-1.s3.amazonaws.com/25220/input03.txt?AWSAccessKeyId=AKIAR6O7GJNX5DNFO3PV&Expires=1642016820&Signature=J4ypdP0YzRxcOWp+y5XaD5ITeMw=&response-content-type=text/plain

It fails cause it takes like 2 minutes to process the code with the same input via my IDE, but the HackerRank tests are limited to 10 seconds.

I need your help to optimize the code and get it to run faster.

Nested loops seem to be the biggest problem here, but I have no idea what should I replace the with.

CodePudding user response：

Checking each apple/orange against each coordinate in range turns your code's runtime complexity into O(n * a n * o) where n is the length of the house, a is the number of the apples and o is the number of oranges. Ideally, your code must run in O(a o).

Here's a refactored version of your solution:

def countApplesAndOranges(s, t, a, b, apples, oranges):
    count_apples = 0
    count_oranges = 0
    for apple in pos_apple:
        if s <= apple   a <= t:
            count_apples  =1
    for orange in pos_orange:
        if s <= orange   b <= t:
            count_oranges  = 1
    print(count_apples)
    print(count_oranges)

CodePudding user response：

Do not build intermediate lists if you can, if this is really needed (this is not the case for tis problem) use a generator instead

And try to do not repeat yourself, factorize with a function when it is possible

def countApplesAndOranges(home_start, home_end, tree_apple, tree_orange, apples, oranges):
    def count_fruits(home_start, home_end, tree, fruits): 
        count = 0
        for fruit in fruits:
            if home_start <= tree   fruit <= home_end:
                count  =1
        return count
    
    print(count_fruits(home_start, home_end, tree_apple, apples))
    print(count_fruits(home_start, home_end, tree_orange, oranges))

CodePudding user response：

I think I would sum() a couple of list comprehensions as a starting point:

apple_hits = sum(
    1 for apple in apples
    if house_min <= apple   apple_tree_origin <= house_max
)

One thing that points out to me though is that subtracting apple_tree_origin from the sides of that test should not change it:

apple_hits = sum(
    1 for apple in apples
    if house_min - apple_tree_origin <= apple <= house_max - apple_tree_origin
)

Now we might observe that house_min - apple_tree_origin can be precomputed and the rest my answer falls out as:

def countApplesAndOranges1(s, t, a, b, apples, oranges):
    house_min = s
    house_max = t

    apple_tree_origin = a
    orange_tree_origin = b

    house_min_apples = house_min - apple_tree_origin
    house_max_apples = house_max - apple_tree_origin

    house_min_oranges = house_min - orange_tree_origin
    house_max_oranges = house_max - orange_tree_origin

    apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
    orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
    return apple_hits, orange_hits

With the test data you provided I get (18409, 19582). Hopefully that is correct.

Feel free to timeit against other solutions:

import timeit

setup = """
with open("apples_oranges.txt") as file_in:
    s,t = list(map(int, file_in.readline().split()))
    a,b = list(map(int, file_in.readline().split()))
    m,n = list(map(int, file_in.readline().split()))
    apples = list(map(int, file_in.readline().split()))
    oranges = list(map(int, file_in.readline().split()))

def countApplesAndOranges_jonsg(s, t, a, b, apples, oranges):
    house_min = s
    house_max = t

    apple_tree_origin = a
    orange_tree_origin = b

    house_min_apples = house_min - apple_tree_origin
    house_max_apples = house_max - apple_tree_origin

    house_min_oranges = house_min - orange_tree_origin
    house_max_oranges = house_max - orange_tree_origin

    apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
    orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
    return apple_hits, orange_hits
"""

print(timeit.timeit("countApplesAndOranges_jonsg(s, t, a, b, apples, oranges)", setup=setup, number=1000))

CodePudding user response：

Thank you guys!

I figured it all out, and used

if s <= apple   a <= t:

in my code as it's the most simple solution to get rid of the nested loop. It works so good it makes me wonder how haven't I thought of it.

I really like all of your solutions and appreciate the help!