Home > Net >  Pandas - Reading CSVs to dataframes in a FOR loop then appending to a master DF is returning a blank
Pandas - Reading CSVs to dataframes in a FOR loop then appending to a master DF is returning a blank

Time:01-24

I've searched for about an hour for an answer to this and none of the solutions I've found are working. I'm trying to get a folder full of CSVs into a single dataframe, to output to one big csv. Here's my current code:

import os

sourceLoc = "SOURCE"
destLoc = sourceLoc   "MasterData.csv"
masterDF = pd.DataFrame([])

for file in os.listdir(sourceLoc):
        workingDF = pd.read_csv(sourceLoc   file)
        print(workingDF)
        masterDF.append(workingDF)
        
print(masterDF)

The SOURCE is a folder path but I've had to remove it as it's a work network path. The loop is reading the CSVs to the workingDF variable as when I run it it prints the data into the console, but it's also finding 349 rows for each file. None of them have that many rows of data in them.

When I print masterDF it prints Empty DataFrame Columns: [] Index: []

My code is from this solution but that example is using xlsx files and I'm not sure what changes, if any, are needed to get it to work with CSVs. The Pandas documentation on .append and read_csv is quite limited and doesn't indicate anything specific I'm doing wrong.

Any help would be appreciated.

CodePudding user response:

you can use glob

import glob
import pandas as pd
import os
path = "your path"
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(path,'*.csv'))))
print(df)

CodePudding user response:

You may store them all in a list and pd.concat them at last.

dfs = [
    pd.read_csv(os.path.join(sourceLoc, file)) 
        for file in os.listdir(sourceLoc)
]

masterDF = pd.concat(df)

CodePudding user response:

There are a couple of things wrong with your code, but the main thing is that pd.append returns a new dataframe, instead of modifying in place. So you would have to do:

masterDF = masterDF.append(workingDF)

I also like the approach taken by I_Al-thamary - concat will probably be faster.

One last thing I would suggest, is instead of using glob, check out pathlib.

import pandas as pd
from pathlib import Path
path = Path("your path")
df = pd.concat(map(pd.read_csv, path.rglob("*.csv"))))
     
  •  Tags:  
  • Related