Home > Net >  Removing multiple rows of a dataframe with with same column value when bad data is found in another
Removing multiple rows of a dataframe with with same column value when bad data is found in another

Time:02-04

I have a pandas dataframe, "tracks", that I'm filtering for erroneous altitude information. When the altitude is below a certain criteria, I want to throw out all rows that start with the same track_key. In the example, N123P, on track_key 4xuut, has an erroneous altitude, so I want to remove ALL rows that start with "4xuut", but NOT the rows below them that have the same call sign.

track_key callsign aircraft_type speed altitude
4xuut N123P C550 300 -1
4xuut N123P C550 297 15
4yt06 N123P C550 305 1022
4yt06 N123P C550 301 1028
4xx21 N348U GALX 350 1025

I've tried this: tracks = tracks[tracks.track_key != tracks.loc[tracks['altitude'].astype('float') <= field_elev, 'track_key'].iloc[0]], but it only seems to work on the first match (there can be several), or, if there are no matches, I get an "out-of-bounds" error.

CodePudding user response:

Try this.

tracks[tracks.groupby('track_key').transform('min')['altitude']>0]

output

    track_key   callsign    aircraft_type   speed   altitude
2   4yt06       N123P       C550            305     1022
3   4yt06       N123P       C550            301     1028
4   4xx21       N348U       GALX            350     1025

Thanks to @bkeesey for this solution.

CodePudding user response:

The reason you see the error, out of bounds is because there is no value to access with an index 0 if there is no erroneous altitude value.

To solve the issue, I used if condition, as follows:

import pandas as pd

tracks = pd.DataFrame({
    'track_key': ['4xuut', '4xuut', '4yt06', '4yt06', '4xx21'],
    'callsign': ['N123P', 'N123P', 'N123P', 'N123P', 'N348U'],
    'aircraft_type': ['C550', 'C550', 'C550', 'C550', 'GALX'],
    'speed': [300, 297, 305, 301, 350],
    'altitude': [-1, 15, 1022, 1028, 1025],
})
#  track_key callsign aircraft_type  speed  altitude
#0     4xuut    N123P          C550    300        -1
#1     4xuut    N123P          C550    297        15
#2     4yt06    N123P          C550    305      1022
#3     4yt06    N123P          C550    301      1028
#4     4xx21    N348U          GALX    350      1025

erroneous = -1

key_to_delete = tracks[tracks['altitude'] == erroneous]['track_key'].values
if len(key_to_delete) > 0:
    tracks = tracks[~tracks['track_key'].str.startswith(key_to_delete[0])]

print(tracks)
#  track_key callsign aircraft_type  speed  altitude
#2     4yt06    N123P          C550    305      1022
#3     4yt06    N123P          C550    301      1028
#4     4xx21    N348U          GALX    350      1025
  •  Tags:  
  • Related