Here's my code:
import pathlib
from pathlib import Path
import os
import pandas as pd
df=pd.DataFrame(
{
'col1': ['red','blue','red','red','green','red','yellow', 'red', 'cyan', 'purple', 'pink', 'black', 'orange'],
'col2': ['red', 'green','orange'],
}
)
The desired output would be to eliminate every value that appears on col1 and also appears on col2, then drop col2 and then export the document to an excel spread sheet.
The desired df would look like this:
df=pd.DataFrame(
{
'col1': ['blue', 'yellow', 'cyan', 'purple', 'pink', 'black',]
}
)
As of now my code looks this this, but it's not giving me the intended result as it does not drop the duplicate values.
df = df.drop(df[(df['col1'] == df['col2'])].index)
df = df.drop(['col2'], inplace=True)
os.startfile(File_path, index=False)
Here is the updated code:
import pathlib
from pathlib import Path
import os
import pandas as pd
df_output = Path.home().joinpath("Desktop", "Remaining colors.xlsx")
df=pd.DataFrame(
{
'col1': ['red','blue','red','red','green','red','yellow', 'red', 'cyan', 'purple', 'pink', 'black', 'orange'],
'col2': ['red', 'green','orange'],
}
)
(df.loc[~df['col1'].isin(df["col2"]), ['col1']])
df = df.drop(columns='col2')
df.to_excel(df_output, index=False)
os.startfile(df_output)
CodePudding user response:
You need to use isin and boolean indexing:
(df.loc[~df['col1'].isin(df['col2']), ['col1']]
.to_excel('filename.xlsx', index=False)
)
Output:
col1
1 blue
6 yellow
8 cyan
9 purple
10 pink
11 black
Used input:
col1 col2
0 red red
1 blue green
2 red orange
3 red NaN
4 green NaN
5 red NaN
6 yellow NaN
7 red NaN
8 cyan NaN
9 purple NaN
10 pink NaN
11 black NaN
12 orange NaN
