Home > Software design >  How to create modified copy of dataframe rows based on conditions in Pandas?
How to create modified copy of dataframe rows based on conditions in Pandas?

Time:01-08

I have a Pandas dataframe which contains few columns and few rows.

    ColXYZ. Interval.  Temperature.  Pressure.  ColCDE.
A.    121       0            25           60        0.195
B.    246       4            40           50        0.350
C.    241       0            40           50        0.133
    ....

I would like to create duplicate values for each row whenever the Interval value is 0, and create this for all the Temperature and Pressure conditions in the dataset - e.g. create copy of row A with all value same except Temperature and Pressure changed, create duplicate for Row C with all values same except Temperature and Pressure changed. This is to reflect that at an Interval of 0, regardless of Temperature and Pressure, all other entries in the dataset rows remain same. The desired output would be like:-

ColXYZ. Interval.  Temperature.  Pressure.  ColCDE.
121       0            25           60        0.195 #A
121       0            40           50        0.195 #Modified A copy
246       4            40           50        0.350 #B
241       0            40           50        0.133 #C
241       0            25           60        0.133 #Modified C copy

Any help and suggestions in this regard would be much appreciated.

CodePudding user response:

Most ideal way out is cross tab, see code below

  m=df['Interval.']==0



 new=(
    
        df[['Temperature.','Pressure.']]#Subset all temps
     
        .merge# cross merge to subset of dfs with interval=0

        (df[m].reset_index(),how='cross',suffixes = ('','_y')).drop_duplicates().drop(columns=['Temperature._y','Pressure._y'])# append back to subset of original df whose Interval was not 0

        .append(df[~m].reset_index())# sort values by index

        .sort_values(by=['index'])
    )

Outcome

   Temperature.  Pressure. index  ColXYZ.  Interval.  ColCDE.
0          25.0       60.0    A.    121.0        0.0    0.195
2          40.0       50.0    A.    121.0        0.0    0.195
0          40.0       50.0    B.    246.0        4.0    0.350
1          25.0       60.0    C.    241.0        0.0    0.133
3          40.0       50.0    C.    241.0        0.0    0.133

Alternative is to create repeat rows and insert as detailed below Original df

    ColXYZ  Interval  Temperature  Pressure  ColCDE
A.     121         0           25        60   0.195
B.     246         4           40        50   0.350
C.     241         0           40        50   0.133

#Generate list of Temps and Pressures

df=df.reset_index()#to preserve index
m=df['Interval']==0#selection of soert criteria
s=df['Temperature'].agg(list)
s1=df['Pressure'].agg(list)

#Duplicate rows in df

df1 = pd.DataFrame(np.repeat(df[m].values, len(df), axis=0), columns=df.columns)

#distribute values of Temp and Pressure to ensure each unique value in original df is represented in each unique ColXYZ

df1['Temperature']= np.tile(s, int(len(df1)/(len(s))))
df1['Pressure']= np.tile(s1, int(len(df1)/(len(s1))))

#Drop duplicates

   df1= df[~m].append(df1.assign(Temperature=np.tile(s, int(len(df1)/(len(s)))),Pressure= np.tile(s1, int(len(df1)/(len(s1))))).drop_duplicates()).sort_values(by=['index'])

Outcome

   index ColXYZ Interval  Temperature  Pressure ColCDE
0    A.    121        0           25        60  0.195
1    A.    121        0           40        50  0.195
1    B.    246        4           40        50   0.35
3    C.    241        0           25        60  0.133
4    C.    241        0           40        50  0.133
  •  Tags:  
  • Related