I'm very new to python. I think it's very simple thing but I can't. What I have to do is removing some strings of one column's each value from specific strings.
| available_list |
|---|
| AE,SG,MO |
| KR,CN |
| SG |
| MO,MY |
all_list = 'AE,SG,MO,MY,KR,CN,US,HK,YS'
I want to remove available_list values from all_list. What I tried is followed code.
col1 = df['available_list']
all_ori = 'AE,SG,MO,MY,KR,CN,US,HK,YS'.split(',')
all_c = all_ori.copy()
result=[]
for i in col1:
for s in i:
all_c.remove(s)
result.append(all_c)
all_c = all_main.copy()
result_df = pd.DataFrame({'Non-Priviliges' : result})
But the result was,
|Non-Priviliges|
|[MY, KR, CN, US, HK, YS]|
|[SG, MO, US, HK, YS]|
|[AE, SG, KR, CN, US, HK, YS]|
The problems are "[", "]". How I remove them?
And after replacing them, I want to paste this series to existing excel file, next-to the column named "Priviliges".
Could you give me some advice? thanks!
CodePudding user response:
Use Series.str.split, Series.explode and set difference operation:
# Create a list from 'available_list' by exploding the column on ','
In [379]: x = df['available_list'].str.split(',').explode().values
# Convert your 'all_list' variable into Python's list after splitting on ','
In [383]: l = all_list.split(',')
# Use 'set' minus operation to find elements present in 'l' but not in 'x'
In [386]: res = list(set(l) - set(x))
In [387]: res
Out[387]: ['YS', 'HK', 'US']
In [390]: new_list = ','.join(res)
In [391]: new_list
Out[391]: 'YS,HK,US'
CodePudding user response:
Assuming your filename is "hello.xlsx", Following is my answer:
import pandas as pd
df = pd.read_excel('hello.xlsx')
all_list_str = 'AE,SG,MO,MY,KR,CN,US,HK,YS'
all_list = all_list_str.split(',')
def find_non_priv(row):
#convert row item string value to list
row_list = row.split(',')
return ','.join(list(set(all_list) - set(row_list)))
# pandas apply is used to call function to each row items.
df['Non-Priviliges'] = df['available_list'].apply(find_non_priv)
df.to_excel('output.xlsx')
