i have two colums in a csv and trying to use pandas to read it and tell my program to remove these players that have matching letters in "ftps" and "value" columns mostly i want to remove E:E, C:C, D:E, C:E matches from both columns
im trying to setup something like this but im so new to python
import pandas as pd
csv_filepath = '/home/joe/Downloads/NBA FD Rankings.csv'
cdf = pd.read_csv(csv_filepath)
for i in range(len(cdf)):
if cdf[(cdf["Name"].isin(cdf.loc[(cdf.Fpts == "C"), "Value"])) & (cdf.Value == "C")]:
optimizer.remove_player(player)
The csv looks like this
|Name |Position |Salary |Fpts |Value |Team |Matchup |Team total
|Georges Niang |PF/SF |3700 |D |C |PHI |LAL |111
|Andre Drummond |C |4400 |D |C |PHI |LAL |111
|Karl-Anthony Towns |C |9300 |A |C |MIN |GSW |112
|Avery Bradley |SG |3700 |D |C |LAL |PHI |106
|Carmelo Anthony |SF/PF |5200 |C |C |LAL |PHI |106
|Anthony Davis |PF/C |8900 |B |C |LAL |PHI |106
|Jordan Poole |PG/SG |5300 |C |C |GSW |MIN |119
|Otto Porter Jr. |SF/PF |5700 |C |C |GSW |MIN |119
|Malik Beasley |SF/SG |3800 |D |D |MIN |GSW |112
|Jaden McDaniels |PF |3900 |D |D |MIN |GSW |112
|Taurean Prince |SF/PF |3500 |E |D |MIN |GSW |112
|Klay Thompson |SG |6200 |C |D |GSW |MIN |119
|Damion Lee |SG |3700 |E |D |GSW |MIN |119
|Nemanja Bjelica |PF |4000 |D |D |GSW |MIN |119
|Isaiah Joe |PG |3500 |E |E |PHI |LAL |111
|Danny Green |SG/SF |3600 |E |E |PHI |LAL |111
CodePudding user response:
To remove the rows that has identical values in the two columns.
import pandas as pd
csv_filepath = '/home/joe/Downloads/NBA FD Rankings.csv'
df = pd.read_csv(csv_filepath)
df = df[df["fpts"] != df["values"]]
CodePudding user response:
There's a few ways you can handle this. pandas has multiple ways of dealing with logic and selecting rows based on that. Basically here I pulled out the index values that match the logic and dropped those rows from the original dataframe.
Note the conditional operators & is AND, | is OR
Given this dataset:
import pandas as pd
from pandas.api.types import CategoricalDtype
cdf = pd.DataFrame(
[['Georges Niang', 'PF/SF', '3700', 'D', 'C', 'PHI', 'LAL', '111'],
['Andre Drummond', 'C', '4400', 'D', 'C', 'PHI', 'LAL', '111'],
['Karl-Anthony Towns', 'C', '9300', 'A', 'C', 'MIN', 'GSW', '112'],
['Avery Bradley', 'SG', '3700', 'D', 'C', 'LAL', 'PHI', '106'],
['Carmelo Anthony', 'SF/PF', '5200', 'C', 'C', 'LAL', 'PHI', '106'],
['Anthony Davis', 'PF/C', '8900', 'B', 'C', 'LAL ', 'PHI', '106'],
['Jordan Poole', 'PG/SG', '5300', 'C', 'C', 'GSW', 'MIN', '119'],
['Otto Porter Jr.', 'SF/PF', '5700', 'C', 'C', 'GSW', 'MIN', '119'],
['Malik Beasley', 'SF/SG', '3800', 'D', 'D', 'MIN', 'GSW', '112'],
['Jaden McDaniels', 'PF', '3900', 'D', 'D', 'MIN', 'GSW', '112'],
['Taurean Prince', 'SF/PF', '3500', 'E', 'D', 'MIN', 'GSW', '112'],
['Klay Thompson', 'SG', '6200', 'C', 'D', 'GSW', 'MIN', '119'],
['Damion Lee', 'SG', '3700', 'E', 'D', 'GSW', 'MIN', '119'],
['Nemanja Bjelica', 'PF', '4000', 'D', 'D', 'GSW', 'MIN', '119'],
['Isaiah Joe', 'PG', '3500', 'E', 'E', 'PHI', 'LAL', '111'],
['Danny Green', 'SG/SF', '3600', 'E', 'E', 'PHI', 'LAL', '111']],
columns = ['Name','Position','Salary','Fpts','Value','Team','Matchup','Team total'])
Option A: Hard Code the combinations
remove_index = cdf[ (cdf['Fpts'] == 'C') & (cdf['Value'] == 'D')
| (cdf['Fpts'] == 'D') & (cdf['Value'] == 'C')
| (cdf['Fpts'] == 'D') & (cdf['Value'] == 'D')
| (cdf['Fpts'] == 'D') & (cdf['Value'] == 'E')
| (cdf['Fpts'] == 'E') & (cdf['Value'] == 'D')
| (cdf['Fpts'] == 'E') & (cdf['Value'] == 'E')].index
filtered_cdf = cdf.drop(remove_index)
Option B: Convert your "grades" as an ordered categorical column and putting a little logic into the filtering
# List the grades, then reverse it so that "A" is considered "bigger"/better than "B"
grades = ["A", "B", "C", "D", "E"]
grades.reverse()
cat_type = CategoricalDtype(categories=grades, ordered=True)
cdf[['Fpts', 'Value']] = cdf[['Fpts', 'Value']].astype(cat_type)
# Find index rows that match the logic
remove_index = cdf[(cdf['Fpts'] <= 'C') & (cdf['Value'] <= 'D') |
(cdf['Value'] <= 'C') & (cdf['Fpts'] <= 'D')].index
filtered_cdf = cdf.drop(remove_index)
Output:
print(filtered_cdf)
Name Position Salary Fpts Value Team Matchup Team total
2 Karl-Anthony Towns C 9300 A C MIN GSW 112
4 Carmelo Anthony SF/PF 5200 C C LAL PHI 106
5 Anthony Davis PF/C 8900 B C LAL PHI 106
6 Jordan Poole PG/SG 5300 C C GSW MIN 119
7 Otto Porter Jr. SF/PF 5700 C C GSW MIN 119
