how to extract all repeating patterns from a string into a dataframe-CodePudding

i have a dataframe with the equiptment codes of certain trucks, this is a similar list o list of the cells

x = [[A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A],
 [A0A,A0B,A1C,A1Z,A2I,A5L,B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J,C0G,C1W,C5B,C5D],
 [A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I,C0A,C1J,C5B,C5D,C6C,C6J,C6Q]]

i want to extract all the values with match with "B" for example ("B1B,B1F,B1H");("B1B,B1F,B1H,B2A,B2X,B3H")("B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I") i try this code but every row each line has a different length sublista = ['B1B','B1F','B1H','B2A','B2X','B4L','B5C','B5I']

df3 = pd.DataFrame(columns=['FIN', 'Equipmentcodes', 'AQUATARDER', 'CAJA'])
for elemento in sublista:
 df_aux=(df2[df2['Equipmentcodes'].str.contains(elemento, case=False)])
 df_aux['CAJA'] = elemento
 df3 = df3.append(df_aux, ignore_index=True)

enter image description here

CodePudding user response：

Assuming your column contains strings, you could use a regex:

df['selected'] = (df['code']
                  .str.extractall(r'\b(B[^,]*)\b')[0]
                  .groupby(level=0).apply(','.join)
                 )

example input:

x = ['A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A',
     'A0A,A0B,A1C,A1Z,A2I,A5L,B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J,C0G,C1W,C5B,C5D',
     'A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I,C0A,C1J,C5B,C5D,C6C,C6J,C6Q']

df = pd.DataFrame({'code': x})

output:

                              selected                                                                             code
0                      B1B,B1F,B1H,B2A                                              A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A
1  B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J      A0A,A0B,A1C,A1Z,A2I,A5L,B1B,B1F,B1H,B2A,B2X,B3H,B4L,B5E,B5J,C0G,C1W,C5B,C5D
2      B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I  A0B,A1C,A1Z,A2E,A5C,B1B,B1F,B1H,B2A,B2X,B4L,B5C,B5I,C0A,C1J,C5B,C5D,C6C,C6J,C6Q