Reading two columns for matching values-CodePudding

i have two colums in a csv and trying to use pandas to read it and tell my program to remove these players that have matching letters in "ftps" and "value" columns mostly i want to remove E:E, C:C, D:E, C:E matches from both columns

im trying to setup something like this but im so new to python

import pandas as pd
csv_filepath = '/home/joe/Downloads/NBA FD Rankings.csv'
cdf = pd.read_csv(csv_filepath)
for i in range(len(cdf)):
    if cdf[(cdf["Name"].isin(cdf.loc[(cdf.Fpts == "C"), "Value"])) & (cdf.Value == "C")]:
        optimizer.remove_player(player)

The csv looks like this

|Name        |Position  |Salary    |Fpts    |Value  |Team   |Matchup    |Team total
   
|Georges Niang     |PF/SF   |3700   |D      |C  |PHI    |LAL    |111
|Andre Drummond    |C       |4400   |D      |C  |PHI    |LAL    |111
|Karl-Anthony Towns |C      |9300   |A      |C  |MIN    |GSW    |112
|Avery Bradley      |SG |3700       |D      |C  |LAL    |PHI    |106
|Carmelo Anthony    |SF/PF  |5200   |C      |C  |LAL    |PHI    |106
|Anthony Davis      |PF/C   |8900   |B      |C  |LAL    |PHI    |106
|Jordan Poole       |PG/SG  |5300   |C      |C  |GSW    |MIN    |119
|Otto Porter Jr.    |SF/PF  |5700   |C      |C  |GSW    |MIN    |119
|Malik Beasley      |SF/SG  |3800   |D      |D  |MIN    |GSW    |112
|Jaden McDaniels    |PF     |3900   |D      |D  |MIN    |GSW    |112
|Taurean Prince     |SF/PF  |3500   |E      |D  |MIN    |GSW    |112
|Klay Thompson      |SG     |6200   |C      |D  |GSW    |MIN    |119
|Damion Lee         |SG     |3700   |E      |D  |GSW    |MIN    |119
|Nemanja Bjelica    |PF     |4000   |D      |D  |GSW    |MIN    |119
|Isaiah Joe         |PG     |3500   |E      |E  |PHI    |LAL    |111
|Danny Green        |SG/SF  |3600   |E      |E  |PHI    |LAL    |111

CodePudding user response：

To remove the rows that has identical values in the two columns.

import pandas as pd
csv_filepath = '/home/joe/Downloads/NBA FD Rankings.csv'
df = pd.read_csv(csv_filepath)
df = df[df["fpts"] != df["values"]]

CodePudding user response：

There's a few ways you can handle this. pandas has multiple ways of dealing with logic and selecting rows based on that. Basically here I pulled out the index values that match the logic and dropped those rows from the original dataframe.

Note the conditional operators & is AND, | is OR

Given this dataset:

import pandas as pd
from pandas.api.types import CategoricalDtype

cdf = pd.DataFrame( 
[['Georges Niang',     'PF/SF',   '3700',   'D',      'C',  'PHI',    'LAL',    '111'],
['Andre Drummond',   'C',       '4400',   'D',      'C',  'PHI',    'LAL',    '111'],
['Karl-Anthony Towns', 'C',      '9300',   'A',      'C',  'MIN',    'GSW',    '112'],
['Avery Bradley',      'SG', '3700',       'D',      'C',  'LAL',    'PHI',    '106'],
['Carmelo Anthony',    'SF/PF',  '5200',   'C',      'C',  'LAL',    'PHI',    '106'],
['Anthony Davis',      'PF/C',   '8900',   'B',      'C',  'LAL ',   'PHI',    '106'],
['Jordan Poole',       'PG/SG',  '5300',   'C',      'C',  'GSW',    'MIN',    '119'],
['Otto Porter Jr.',    'SF/PF',  '5700',   'C',      'C',  'GSW',    'MIN',    '119'],
['Malik Beasley',      'SF/SG',  '3800',   'D',      'D',  'MIN',    'GSW',    '112'],
['Jaden McDaniels',    'PF',     '3900',   'D',      'D',  'MIN',    'GSW',    '112'],
['Taurean Prince',     'SF/PF',  '3500',   'E',      'D',  'MIN',    'GSW',    '112'],
['Klay Thompson',      'SG',     '6200',   'C',      'D',  'GSW',    'MIN',    '119'],
['Damion Lee',         'SG',    '3700',   'E',      'D',  'GSW',    'MIN',    '119'],
['Nemanja Bjelica',    'PF',     '4000',   'D',      'D',  'GSW',    'MIN',    '119'],
['Isaiah Joe',       'PG',     '3500',   'E',      'E',  'PHI',    'LAL',    '111'],
['Danny Green',        'SG/SF',  '3600',   'E',      'E',  'PHI',    'LAL',    '111']],
columns = ['Name','Position','Salary','Fpts','Value','Team','Matchup','Team total'])

Option A: Hard Code the combinations

remove_index = cdf[   (cdf['Fpts'] == 'C') & (cdf['Value'] == 'D')
                   |  (cdf['Fpts'] == 'D') & (cdf['Value'] == 'C')
                   |  (cdf['Fpts'] == 'D') & (cdf['Value'] == 'D')
                   |  (cdf['Fpts'] == 'D') & (cdf['Value'] == 'E')
                   |  (cdf['Fpts'] == 'E') & (cdf['Value'] == 'D')
                   |  (cdf['Fpts'] == 'E') & (cdf['Value'] == 'E')].index

filtered_cdf = cdf.drop(remove_index)

Option B: Convert your "grades" as an ordered categorical column and putting a little logic into the filtering

# List the grades, then reverse it so that "A" is considered "bigger"/better than "B"
grades = ["A", "B", "C", "D", "E"]
grades.reverse()
cat_type = CategoricalDtype(categories=grades, ordered=True)
cdf[['Fpts', 'Value']] = cdf[['Fpts', 'Value']].astype(cat_type)

# Find index rows that match the logic
remove_index = cdf[(cdf['Fpts'] <= 'C') & (cdf['Value'] <= 'D') |
                   (cdf['Value'] <= 'C') & (cdf['Fpts'] <= 'D')].index
filtered_cdf = cdf.drop(remove_index)

Output:

print(filtered_cdf)
                 Name Position Salary Fpts Value  Team Matchup Team total
2  Karl-Anthony Towns        C   9300    A     C   MIN     GSW        112
4     Carmelo Anthony    SF/PF   5200    C     C   LAL     PHI        106
5       Anthony Davis     PF/C   8900    B     C  LAL      PHI        106
6        Jordan Poole    PG/SG   5300    C     C   GSW     MIN        119
7     Otto Porter Jr.    SF/PF   5700    C     C   GSW     MIN        119