how to iterate over a list with condition-CodePudding

I know this question might be a bit inappropriate for this platform but I have nowhere else to ask for help.

I'm new to python and I'm trying to learn how to iterate over a list with some conditions. I have the next problem - for each unique link from Where to Where, I want to choose one of the most profitable suppliers. A profitable supplier is a supplier that most of the days of the week turned out to be cheaper (that is, had a lower cost) than other suppliers. The dataset is the following where columns are: 1st-From, 2nd-To, 3rd-Day in a week, 4th-supplier's number, 5th-Cost.

To solve this task, I've decided firstly to create a new column and list with unique routes.

df_routes['route'] = df_routes['From']   '-'   df_routes['Where']
routes = df_routes['route'].unique()
len(routes)

And then iterate over it but I do not fully understand how the structure should look like. My guess is that it should be something like this:

for i, route in enumerate(routes):
    x = df_routes[df_routes['route'] == route]
    if x['supplier'].nunique() == 1:
        print(route, supplier)
    else:
...

I don't know how to structure it further and whether this is the right structure. So how it should look like?

I will really appreciate any help (tips, hints, snippets of code) on this question.

CodePudding user response：

This is more efficiently solved with pandas functions rather than looping

Let df be a portion of your dataframe for the first two routes. First we sort by cost and group by the route and the 'Day'. This will tell us for each day and each route which supplier is the cheapest:

df1 = df.sort_values('Cost', ascending = True).groupby(['From','To', 'Day']).first()

df1 looks like this:

            Supplier    Cost
From To Day     
BGW MOW 1   3   75910
        2   3   75990
        3   3   27340
        4   3   75990
        5   11  19880
        6   3   75440
        7   11  24740
OSS UUS 1   47  65650
        2   47  47365
        3   47  70635
        4   47  47365
        5   47  62030
        6   47  62030
        7   47  71010

Next we count the number of mentions for each supplier for each route:

df2 = df1.groupby(['From','To'])['Supplier'].value_counts().rename('days').reset_index(level=2)

df2 looks like this:


        Supplier    days
From To     
BGW MOW 3           5
    MOW 11          2
OSS UUS 47          7

eg for the first route, supplier 3 was the cheapest for 5 days and supplier 11 for 2 days

Now we just pick the first (most-mentioned) supplier for each route:

df3 = df2.groupby(['From','To']).first()

df3 is the final output and looks like this:


        Supplier    days
From To     
BGW MOW 3           5
OSS UUS 47          7

CodePudding user response：

Groupby dataframe based on columns (['From','To', 'Day']) and use aggregate min ('Cost')

function to get result

df.groupby(['From','To', 'Day']).min('Cost').reset_index()