I am writing a simple script to iterate through a number of text files within a pre-defined folder and try to find patterns using dataframe. However I got the below error on the bolded line. Could anyone have a look? thanks a lot!
OSError: Initializing from file failed
for files in os.listdir(outPath):
if files.endswith(".txt"):
for f in os.listdir(outPath):
**df1=pd.read_csv(f,header=None)**
for line in df1:
df1=df1[~df1[0].str.contains(pattern1)]
df2=df1[~df1[0].str.contains(pattern2)]
CodePudding user response:
You should do the read only for the ones which passed the check (in if block), by doing this:
for files in os.listdir(outPath):
if files.endswith(".txt"):
for f in os.listdir(outPath):
The if statement went useless. Thus, you should just do:
for f in os.listdir(outPath):
if f.endswith(".txt"):
df1=pd.read_csv(f,header=None)
By doing this, you will only read csv from the txt files, not another random files.
CodePudding user response:
You can use glob. Also, please don't loop over the dataframe, it will be slower.
import glob
flist = glob.glob(outPath '/*.txt')
for fn in flist:
df = pd.read_csv(fn, low_memory=False, header=None)
df1 = df.loc[~df[0].str.contains(pattern1, regex=False)] # if you have a fixed string, regex=False will be much faster
