I'm trying to match a list of column names with possible combinations for the columns to dataframe
Here's a sample dataframe
sample_df = pd.DataFrame({'Column1': [2, 4, 8, 0],
'Column_2': [2, 0, 0, 0],
'Column3_': [10, 2, 1, 8],
'4thColumn': [22,32,11,12],
'Col5':[1,2,3,4],
'col_6':[23,11,56,98]})
and the list is something like this
l1 = ['column1;Column1;Column1_;Column_1',
'column2;Column2;Column2_;Column_2',
'column3;Column3;Column3_;Column_3',
'column4;Column4;Column4_;Column_4;4thColumn']
now I'm using this below logic to get this but if I've a lot of columns, it might not be a feasible solution. Does anybody know any other interesting approach?
l1_split = [list(each.split(';')) for each in l1]
df_cols = sample_df.columns.to_list()
l2 = []
for x in df_cols:
for y_l1 in l1_split:
if x in y_l1:
l2.append(x)
print(sample_df[l2])
CodePudding user response:
You can create fllatened list by all splitted values and then use Index.intersection:
L = [x for each in l1 for x in each.split(';')]
print (L)
['column1', 'Column1', 'Column1_', 'Column_1',
'column2', 'Column2', 'Column2_', 'Column_2',
'column3', 'Column3', 'Column3_', 'Column_3',
'column4', 'Column4', 'Column4_', 'Column_4', '4thColumn']
print(sample_df.columns.intersection(L, sort=False))
Index(['Column1', 'Column_2', 'Column3_', '4thColumn'], dtype='object')
print(sample_df[sample_df.columns.intersection(L, sort=False)])
Column1 Column_2 Column3_ 4thColumn
0 2 2 10 22
1 4 0 2 32
2 8 0 1 11
3 0 0 8 12
