Home > Software engineering >  Aggregating the unknown csv files for a particular folder
Aggregating the unknown csv files for a particular folder

Time:01-15

I have some number of csv files in a folder. waterfolder has water_202201.csv , water_202202.csvand water_202203.csv.I want to aggregate these 3 files. Of course, I can do like the code below.

import pandas as pd
import numpy as np
#-*-coding:utf-8-*-
dat1=pd.read_csv("C:/water/water_202201.csv")
dat2=pd.read_csv("C:/water/water_202202.csv")
dat3=pd.read_csv("C:/water/water_202202.csv")
frames=[data1,data2,data3]
result1=pd.concat(frames)
result1

But the question is how to aggregate if I don't know how many csv files are in the waterfolder and somehow I want to aggregate every csv files inside that folder. 202201 means 2022 January

CodePudding user response:

You can use pathlib to iterate over your folder:

import pathlib

data = {}
for file in pathlib.Path('C:/water').glob('water_*.csv'):
    date = file.stem.split('_')[-1]  # extract 202201 for water_202201.csv
    df = pd.read_csv(file)
    data[date] = df

# One-line version
data = {file.stem.split('_')[1]: pd.read_csv(file) 
           for file in pathlib.Path('./data/water').glob('water_*.csv')}

Now 2 possibilities:

Prefix the index by the date key:

df = pd.concat(data)

Without prefix:

df = pd.concat(data.values())
  •  Tags:  
  • Related