Home > Enterprise >  Perform action after all s3 files processed
Perform action after all s3 files processed

Time:01-19

I have files uploaded onto my s3 bucket - always 6, twice per day. I am running a fairly standard setup:

  • S3 -> SNS -> SQS -> lambda

Each lambda processes the newly uploaded file and then deletes it. I know the exact number of files in each batch, but I cannot require the client to perform any other action (e.g. message on SNS).

Question: what's the easiest/best way to perform a certain action after processing all files?

My ideas:

  • Step Functions - not sure how?
  • Simply check in each lambda if s3 items count is zero (or check sqs message queue size?) - not sure if there won't be a race condition against a delete immediately before (is it always consistent) or similar issues?
  • CloudWatch alarm when SQS queue depth is zero -> SNS -> lambda - I guess it should work, not sure about the correct metric?

I would appreciate info on the best/simplest way to achieve it.

CodePudding user response:

If you are sure that by x o'clock, all your 6 files will proceed then simply you can create a cloud watch and schedule it at 11:50 PM, and based on your validation just delete the files.

CodePudding user response:

You could use the number of files in the S3 bucket location to capture "file count" state. The processing lambda runs on each file-add, but conditionally initiates the delete and post-processing steps only when objectCount === 6.

How can we use S3 to keep track of file count? Lots of possibilities, here are two:

Option 1: defer processing until all 6 files have arrived

When triggered on OBJECT_CREATED, the lambda counts the S3 objects. If objectCount < 6 the lambda exits without further action. If 6 files exist, process all 6, delete the files and perform the post-processing action.

Option 2: use S3 tags to indicate PROCESSED status

When triggered on OBJECT_CREATED, the lambda processes the new file and adds a PROCESSED tag to the S3 Object. The lambda then counts the S3 objects with PROCESSED tags. If 6, delete the files and perform the post-processing action.

In any case, think through race conditions and other error states. This is often where "best" and "simplest" sometimes conflict.

N.B. Step Functions could be used to chain together the processing steps, but they don't offer a different way to keep track of file count state.

  •  Tags:  
  • Related