I'm trying to improve some code and stop it throwing a warning/error once it's finished (due to the use of next(iter).
I'm essentially creating a database engine, reading in a large CSV file in chunks, and appending the data here to a PostgreSQL database.
However once the while loop is finished, there's an error. I could probably add some error handling to suppress this, but wondering if there's a more efficient way to do this in general.
# Create database engine
engine = create_engine(# contains Postgres connection stuff)
# Read in CSV
iter = pd.read_csv("myfile.csv", iterator=True, chunksize=100000)
df = next(iter)
# Convert column to datetime
df.date_col = pd.to_datetime(df.date_col)
# Create table
df.head(0).to_sql(name = "mytable", con = engine, if_exists='replace')
# Add first chunk
df.to_sql(name="mytable", con=engine, if_exists='append')
# Add remaining CSV data to database table
while True:
df = next(iter)
df.date_col = pd.to_datetime(df.date_col)
df.to_sql(name="mytable", con=engine, if_exists='append')
CodePudding user response:
The problem here is usage of while(True) loop. Ideally to iterate an iterator, you can use a for loop.
Incase you want to use a while loop, then take a look at how to use iterator in while loop.
I would achieve what you are looking for using something like this:
_iter = pd.read_csv("myfile.csv", iterator=True, chunksize=100000)
df = next(_iter)
# Convert column to datetime
df.date_col = pd.to_datetime(df.date_col)
# Create table
df.head(0).to_sql(name = "mytable", con = engine, if_exists='replace')
# Add first chunk
df.to_sql(name="mytable", con=engine, if_exists='append')
for i in _iter:
i.date_col = pd.to_datetime(i.date_col)
i.to_sql(name="mytable", con=engine, if_exists='append')
Using a for loop avoids running into StopIteration exception which indicates end of an iterator. Read More on Iterators here.
