I have some number of csv files in a folder.
waterfolder has water_202201.csv , water_202202.csvand water_202203.csv.I want to aggregate these 3 files. Of course, I can do like the code below.
import pandas as pd
import numpy as np
#-*-coding:utf-8-*-
dat1=pd.read_csv("C:/water/water_202201.csv")
dat2=pd.read_csv("C:/water/water_202202.csv")
dat3=pd.read_csv("C:/water/water_202202.csv")
frames=[data1,data2,data3]
result1=pd.concat(frames)
result1
But the question is how to aggregate if I don't know how many csv files are in the waterfolder and somehow I want to aggregate every csv files inside that folder.
202201 means 2022 January
CodePudding user response:
You can use pathlib to iterate over your folder:
import pathlib
data = {}
for file in pathlib.Path('C:/water').glob('water_*.csv'):
date = file.stem.split('_')[-1] # extract 202201 for water_202201.csv
df = pd.read_csv(file)
data[date] = df
# One-line version
data = {file.stem.split('_')[1]: pd.read_csv(file)
for file in pathlib.Path('./data/water').glob('water_*.csv')}
Now 2 possibilities:
Prefix the index by the date key:
df = pd.concat(data)
Without prefix:
df = pd.concat(data.values())
