I am trying to archive each input payload received to my Java REST API as a separate file into s3. A date wise folder will be created for storing request payloads per day under the s3 bucket.
Input can range between 1 request to upto one million requests per day. Each payload file is tiny, just around 500 bytes.
Storage structure is as below,
s3
|_ abc bucket
|_ 2022-01-12 -> (will contain all requests as separate files received on 12th Jan)
|_ 2022-01-13 -> (will contain all requests as separate files received on 13th Jan)
My code is currently handling it by uploading each file in parallel and letting S3 handle the load. I am surprised to see that the time for each upload is increasing, not in correlation to the file size (which is small) but in correlation to the number of objects I am trying to upload. It got to as high as 4 seconds from 0.4 seconds for the upload operation to complete for ~1 million requests.
Is the upload latency due to increased folder size? Is there any best practice to create sub directories under each date wise folder to increase upload speed?
CodePudding user response:
AWS S3 has a limit of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an Amazon S3 bucket.
https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/
There are no limits to the number of prefixes that you can have in your bucket. Best way is to efficiently create partitions interms of prefixes to avoid the bottleneck during simultaneous upload.
Suggestion is to compute a hash dynamically to name a prefix. You can find best practices to create prefixs under, https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-key-naming-pattern
