how to delete all the rows of a csv that have the word "class" in the first column, except-CodePudding

import pandas as pd
df = pd.read_csv('coords.csv',sep=',',header=1)

In this case, the header row that has the word "class" in its first column is repeated a few rows below, and what I need is to leave the csv file with only the first row that contains the word "class" in its first column and the rest remove them. When I mean to remove them, I do not mean that they are left blank because that would affect the data, but simply remove them

CodePudding user response：

Here's a little script to filter out those rows. It doesn't load the whole file into memory, instead doing a read-write for every row, except rows that start with 'class':

import csv

with open('coords_filtered.csv', 'w', newline='') as out_f:
    writer = csv.writer(out_f)

    with open('coords.csv', newline='') as in_f:
        reader = csv.reader(in_f)
        # Transfer header
        writer.writerow(next(reader))

        for row in reader:
            if row[0] == 'class':
                continue  # skip row / don't write

            writer.writerow(row)

CodePudding user response：

If I understood right you need to purge all the repeated headers appearing within data. If that's the case and the file is not that huge you can filter the dataframe after the read_csv using

import pandas as pd
df = pd.read_csv('coords.csv',sep=',',header=0)
df = df[df['class'] != 'class']

Edit: To make it working you must treat the first row, which has index 0 as an header so the dataframe can be filtered