Home > OS >  Pymongo AutoReconnect Error while uploading Pandas DataFrame
Pymongo AutoReconnect Error while uploading Pandas DataFrame

Time:02-05

I am trying to upload a medium-sized Pandas DataFrame to a MongoDB Serverless instance through:

client[db_name][collection_name].insert_many(df.to_dict(orient="records"))

However, at a certain point, the following exception is raised:

AutoReconnect: xxx-xx.xxxx.mongodb.net:00000: connection closed

How can I modify my code to successfully upload the file?

Additional info:

  • Pymongo version is 4.0.1
  • I'm using my local machine to upload the data
  • I've uploaded larger files before through the same setup (local machine & Atlas)

CodePudding user response:

1 ) I think you should add '.' between DB name and collection name. ( I am not sure about your line client[db_name][collection_name]).

2 ) Also you should put the data into a list then feed it to MongoDB like below:

db.collection.insertMany([ <document 1> , <document 2>, ... ],)

CodePudding user response:

A workaround that worked consisted in splitting the dataset into chunks and perform sequential uploads:

steps_l = list(np.arange(0, len(df), n_steps))   [len(df)]
logger.debug(f"We got {len(steps_l)} chunks.")

for start, end in zip(steps_l, steps_l[1:]):

    logger.debug(f"Chunk from position {start} to {end}")

    client[db_name][collection_name].insert_many(df.iloc[start:end].to_dict(orient="records"))
  •  Tags:  
  • Related