Home > Blockchain >  Set value Based On group By
Set value Based On group By

Time:01-16

i have df like this

   customer ID     failreason
0   1            
1   2                NaN
3   1             "Not valid Name"
4   3                 NaN
5   2             "Not valid Contact No."

and I want like this:

 customer ID                failreason
0   1              "Not valid Due to same Customer id available"
1   2              "Not valid Due to same Customer id available"
3   1             "Not valid Name, Not valid Due to same Customer id available"
4   3               Nan
5   2              "Not valid Contact No., Not valid Due to same Customer id available"

i tried with blow code:

gtemp = df.groupby('Customer ID')

for ind,i in enumerate(gtemp.groups):
    tmp = gtemp.get_group(i)

    tempValue = tmp['failreason'].agg('nunique')
    if tempValue > 1:

         df['failreason'] = np.where((df['Customer ID']==i), 
                                df['failreason']  ',Not valid Due to same Customer id available',
                                df['failreason'])

  

but it's not work for the if the failreason contains Nan

CodePudding user response:

You can remove the empty strings, groupby size to count the number of rows per group and set a mask if more than 1 row. Then fillna on the masked dataframe.

NB. I used a slightly different dataset.

df = pd.DataFrame({'ID': [1,2,1,3,2],
                   'reason': ['', float('nan'), 'A', float ('nan'), 'B']})
#    ID reason
# 0   1    NaN
# 1   2    NaN
# 2   1      A
# 3   3    NaN
# 4   2      B

# remove empty strings
df['reason'] = df['reason'].replace('', float ('nan'))

# compute mask
mask = df['reason'].groupby(df['ID']).transform('size').gt(1)

# fillna   concatenate   strip
df.loc[mask, 'reason'] = (df.loc[mask, 'reason'].fillna('') ',dup').str.lstrip(',')

Output:

   ID reason
0   1    dup
1   2    dup
2   1  A,dup
3   3    NaN
4   2  B,dup

CodePudding user response:

df = df.fillna('')
df

Output:

    customer ID     failreason
0   1   
1   2   
2   1   Not valid Name
3   3   
4   2   Not valid Contact No.
group_data = df.groupby('customer ID')


def str_add(cell_text):
    message = "Not valid Due to same Customer id available"
    return f"{cell_text} {message}".strip()


for cutom_id, index in group_data.groups.items():
    if len(index) > 1:
        df['failreason'].iloc[index] = df['failreason'].iloc[index].apply(str_add)
df

Output:


    customer ID     failreason
0   1   Not valid Due to same Customer id available
1   2   Not valid Due to same Customer id available
2   1   Not valid Name Not valid Due to same Customer ...
3   3   
4   2   Not valid Contact No. Not valid Due to same Cu...
  •  Tags:  
  • Related