I am trying to upload a medium-sized Pandas DataFrame to a MongoDB Serverless instance through:
client[db_name][collection_name].insert_many(df.to_dict(orient="records"))
However, at a certain point, the following exception is raised:
AutoReconnect: xxx-xx.xxxx.mongodb.net:00000: connection closed
How can I modify my code to successfully upload the file?
Additional info:
- Pymongo version is 4.0.1
- I'm using my local machine to upload the data
- I've uploaded larger files before through the same setup (local machine & Atlas)
CodePudding user response:
1 ) I think you should add '.' between DB name and collection name. ( I am not sure about your line client[db_name][collection_name]).
2 ) Also you should put the data into a list then feed it to MongoDB like below:
db.collection.insertMany([ <document 1> , <document 2>, ... ],)
CodePudding user response:
A workaround that worked consisted in splitting the dataset into chunks and perform sequential uploads:
steps_l = list(np.arange(0, len(df), n_steps)) [len(df)]
logger.debug(f"We got {len(steps_l)} chunks.")
for start, end in zip(steps_l, steps_l[1:]):
logger.debug(f"Chunk from position {start} to {end}")
client[db_name][collection_name].insert_many(df.iloc[start:end].to_dict(orient="records"))
