Iterate through Python for loop more quickly-CodePudding

I have a Pandas data frame (called "ud_flex" below) that looks like the one below: The data frame has over 27 million observations in it that I'm trying to iterate through to do a calculation for each row. Below is the calculation that I'm using:

def set_fpts(pos, rank, curr_fpts):
    if pos == "RB" and rank >= 3.0:
        return 0
    elif pos == "WR" and rank >= 4.0:
        return 0
    elif (pos == "TE" or pos == "QB") and rank >= 2.0:
        return 0
    else:
        return curr_fpts

Here is the for loop that I've created:

players = ud_flex.shape[0]

for i in range(0,players):
    new_fpts = set_fpts(ud_flex.iloc[i]['position_name'], ud_flex.iloc[i]['wk_rank_orig'], ud_flex.iloc[i]['fpts'])
    ud_flex.at[i, 'fpts_orig'] = new_fpts

Does anyone have any suggestions for how to speed up this loop? It's currently taking nearly an hour! Thanks!

CodePudding user response：

You could start making an algorithm that exits faster:

def set_fpts(pos, rank, curr_fpts):
    if rank > 4:
        return 0
    if rank < 2:
        return curr_fpts
    if pos in ["TE", "QB"]:
        return 0
    if rank >= 3:
        if pos == "RB":
            return 0
    return curr_fpts

CodePudding user response：

In general, iterating through pandas data frames is slow, so it's not surprising that your for loop based approach is taking a while.

I suspect that the following alternative should work more quickly for a data frame of your size.

mask = (((ud_flex['position_name']=="RB") & (ud_flex['wk_rank_orig']>=3))
       |((ud_flex['position_name']=="WR") & (ud_flex['wk_rang_orig']>=4))
       |((ud_flex['position_name'].isin["TE","QB"]) & (ud_flex['wk_rang_orig']>=2)))
ud_flex['fpts_orig'][mask] = 0
ud_flex['fpts_orig'][~mask] = ud_flex['fpts']