How do I use the length of another column in Pandas as a slice argument-CodePudding

I am trying to remove rows from a dataframe where the first sequence of letters in the Ref column are equal to the Product column.

For example, for the input:

 --------- --------------- 
| Product | Provision Ref |
 --------- --------------- 
| DVX     | DVX9251       |
 --------- --------------- 
| CDV     | 22CDV95       |
 --------- --------------- 
| TV      | TV12369       |
 --------- --------------- 
| TV      | 992TV15       |
 --------- ---------------

Desired output:

 --------- --------------- 
| Product | Provision Ref |
 --------- --------------- 
| CDV     | 22CDV95       |
 --------- --------------- 
| TV      | 992TV15       |
 --------- ---------------

I have tried both of the following pieces of code but they are not working

df = df.loc[df['Provision Ref'].str[0:df['Product'].map(len)] != df['Product']]
df = df.loc[df['Provision Ref'].str[0:int(df['Product'].map(len))] != df['Product']]

CodePudding user response：

Try this:

filtered = df[df.groupby('Product', sort=False).apply(lambda g: g['Provision Ref'].str.startswith(g['Product'].iloc[0])).tolist()]

Output:

>>> filtered
  Product Provision Ref
0     DVX       DVX9251
2      TV       TV12369

More readable but less efficient:

filtered = df[df.apply(lambda x: x['Provision Ref'].startswith(x['Product']), axis=1)]

Another method, probably more efficient if the items of Product have few unique lengths (e.g. most are either 2, 3 or 4 chars long, etc.):

filtered = df[df.groupby(df['Product'].str.len(), sort=False).apply(lambda x: x['Provision Ref'].str[:len(x['Product'].iloc[0])] == x['Product']).tolist()]

CodePudding user response：

We can use a row-wise .apply(), because df['Provision Ref'].str.startswith(df['Product']) isn't vectorized like that (as @anarchy wrote).

df[~df.apply(lambda row: row['Provision Ref'].startswith(row['Product']), axis=1)]

  Product Provision Ref
1     CDV       22CDV95
3      TV       992TV15