Home > Enterprise >  Iterate through a number of txt files and find patterns
Iterate through a number of txt files and find patterns

Time:01-25

I am writing a simple script to iterate through a number of text files within a pre-defined folder and try to find patterns using dataframe. However I got the below error on the bolded line. Could anyone have a look? thanks a lot!

OSError: Initializing from file failed

for files in os.listdir(outPath):
    if files.endswith(".txt"):
           for f in os.listdir(outPath):
               **df1=pd.read_csv(f,header=None)**
               for line in df1:
                   df1=df1[~df1[0].str.contains(pattern1)]
                   df2=df1[~df1[0].str.contains(pattern2)]

CodePudding user response:

You should do the read only for the ones which passed the check (in if block), by doing this:

for files in os.listdir(outPath):
    if files.endswith(".txt"):
           for f in os.listdir(outPath):

The if statement went useless. Thus, you should just do:

for f in os.listdir(outPath):
    if f.endswith(".txt"):
         df1=pd.read_csv(f,header=None)

By doing this, you will only read csv from the txt files, not another random files.

CodePudding user response:

You can use glob. Also, please don't loop over the dataframe, it will be slower.

import glob
flist = glob.glob(outPath   '/*.txt')
for fn in flist:
    df = pd.read_csv(fn, low_memory=False, header=None)
    df1 = df.loc[~df[0].str.contains(pattern1, regex=False)]  # if you have a fixed string, regex=False will be much faster
  •  Tags:  
  • Related