I have a Pandas dataframe which contains few columns and few rows.
ColXYZ. Interval. Temperature. Pressure. ColCDE.
A. 121 0 25 60 0.195
B. 246 4 40 50 0.350
C. 241 0 40 50 0.133
....
I would like to create duplicate values for each row whenever the Interval value is 0, and create this for all the Temperature and Pressure conditions in the dataset - e.g. create copy of row A with all value same except Temperature and Pressure changed, create duplicate for Row C with all values same except Temperature and Pressure changed. This is to reflect that at an Interval of 0, regardless of Temperature and Pressure, all other entries in the dataset rows remain same.
The desired output would be like:-
ColXYZ. Interval. Temperature. Pressure. ColCDE.
121 0 25 60 0.195 #A
121 0 40 50 0.195 #Modified A copy
246 4 40 50 0.350 #B
241 0 40 50 0.133 #C
241 0 25 60 0.133 #Modified C copy
Any help and suggestions in this regard would be much appreciated.
CodePudding user response:
Most ideal way out is cross tab, see code below
m=df['Interval.']==0
new=(
df[['Temperature.','Pressure.']]#Subset all temps
.merge# cross merge to subset of dfs with interval=0
(df[m].reset_index(),how='cross',suffixes = ('','_y')).drop_duplicates().drop(columns=['Temperature._y','Pressure._y'])# append back to subset of original df whose Interval was not 0
.append(df[~m].reset_index())# sort values by index
.sort_values(by=['index'])
)
Outcome
Temperature. Pressure. index ColXYZ. Interval. ColCDE.
0 25.0 60.0 A. 121.0 0.0 0.195
2 40.0 50.0 A. 121.0 0.0 0.195
0 40.0 50.0 B. 246.0 4.0 0.350
1 25.0 60.0 C. 241.0 0.0 0.133
3 40.0 50.0 C. 241.0 0.0 0.133
Alternative is to create repeat rows and insert as detailed below Original df
ColXYZ Interval Temperature Pressure ColCDE
A. 121 0 25 60 0.195
B. 246 4 40 50 0.350
C. 241 0 40 50 0.133
#Generate list of Temps and Pressures
df=df.reset_index()#to preserve index
m=df['Interval']==0#selection of soert criteria
s=df['Temperature'].agg(list)
s1=df['Pressure'].agg(list)
#Duplicate rows in df
df1 = pd.DataFrame(np.repeat(df[m].values, len(df), axis=0), columns=df.columns)
#distribute values of Temp and Pressure to ensure each unique value in original df is represented in each unique ColXYZ
df1['Temperature']= np.tile(s, int(len(df1)/(len(s))))
df1['Pressure']= np.tile(s1, int(len(df1)/(len(s1))))
#Drop duplicates
df1= df[~m].append(df1.assign(Temperature=np.tile(s, int(len(df1)/(len(s)))),Pressure= np.tile(s1, int(len(df1)/(len(s1))))).drop_duplicates()).sort_values(by=['index'])
Outcome
index ColXYZ Interval Temperature Pressure ColCDE
0 A. 121 0 25 60 0.195
1 A. 121 0 40 50 0.195
1 B. 246 4 40 50 0.35
3 C. 241 0 25 60 0.133
4 C. 241 0 40 50 0.133
