I have a pandas dataframe that looks like this
data1 data2
0 overall_phase1_b3 overall_phase1_b5
1 overall_phase2_b3 overall_phase5_b5
2 overall_phase3_b3 overall_phase3_b5
My question is how can I get the dataframe rows with matching phase number? If I have phase1 in data1 column, I should have phase1 in data2 column.
Desired Output as below
data1 data2
0 overall_phase1_b3 overall_phase1_b5
1 overall_phase3_b3 overall_phase3_b5
CodePudding user response:
You do not need regex to achieve this. You can use something like this instead:
df[df.data1.str.split("_", expand=True)[1] == df.data2.str.split("_", expand=True)[1]]
------------------------------------------
data1 data2
0 overall_phase1_b3 overall_phase1_b5
2 overall_phase3_b3 overall_phase3_b5
------------------------------------------
What this does is basically to split the columns data1 and data2 by '_' and then it compares the second value (including 'phasex') of the extended data frames in both columns. The comparison gives you a mask that can be used to reduce your data.
CodePudding user response:
Since we are dealing with Pandas, I'll provide you the simple answer.
import pandas as pd
df = pd.DataFrame(columns=["data1","data2"])
data1 = ['overall_phase1_b3','overall_phase1_b3','overall_phase3_b3']
data2 = ['overall_phase1_b5','overall_phase5_b5','overall_phase3_b5']
df['data1'] = data1
df['data2'] = data2
df
The above code will generate you the Pandas Dataframe for the given data.
result = pd.DataFrame(columns=["data1","data2"])
result_d1 = []
result_d2 = []
for i,j in df.iterrows():
if j.data1.split('_')[1][-1] == j.data2.split('_')[1][-1]:
result_d1.append(j.data1)
result_d2.append(j.data2)
result['data1'] = result_d1
result['data2'] = result_d2
result
After looking your data, we can use the String Split method to compare the phase number with the respective rows that'll tell you the matching phases across each rows. If you don't want to store the result in a DataFrame, better to use print statement instead of pushing the results in a DataFrame.
Nice Question though, happy coding ..!
