Home > Back-end >  Dataframe match columns to a list of possible names
Dataframe match columns to a list of possible names

Time:02-04

I'm trying to match a list of column names with possible combinations for the columns to dataframe

Here's a sample dataframe

sample_df = pd.DataFrame({'Column1': [2, 4, 8, 0],
                      'Column_2': [2, 0, 0, 0],
                      'Column3_': [10, 2, 1, 8],
                      '4thColumn': [22,32,11,12],
                      'Col5':[1,2,3,4],
                      'col_6':[23,11,56,98]})

and the list is something like this

l1 = ['column1;Column1;Column1_;Column_1',
  'column2;Column2;Column2_;Column_2',
  'column3;Column3;Column3_;Column_3',
  'column4;Column4;Column4_;Column_4;4thColumn']

now I'm using this below logic to get this but if I've a lot of columns, it might not be a feasible solution. Does anybody know any other interesting approach?

l1_split = [list(each.split(';')) for each in l1]
df_cols = sample_df.columns.to_list()
l2 = []
for x in df_cols:
    for y_l1 in l1_split:
        if x in y_l1:
            l2.append(x)
            
print(sample_df[l2])

CodePudding user response:

You can create fllatened list by all splitted values and then use Index.intersection:

L = [x for each in l1 for x in each.split(';')]
print (L)
['column1', 'Column1', 'Column1_', 'Column_1',
 'column2', 'Column2', 'Column2_', 'Column_2', 
 'column3', 'Column3', 'Column3_', 'Column_3',
 'column4', 'Column4', 'Column4_', 'Column_4', '4thColumn']


print(sample_df.columns.intersection(L, sort=False))
Index(['Column1', 'Column_2', 'Column3_', '4thColumn'], dtype='object')
      
print(sample_df[sample_df.columns.intersection(L, sort=False)])
   Column1  Column_2  Column3_  4thColumn
0        2         2        10         22
1        4         0         2         32
2        8         0         1         11
3        0         0         8         12
  •  Tags:  
  • Related