I'm working on a lambda function for which I need list of all the folders in a S3 bucket. I need to be able to traverse the each folder and get all the subfolders until the end of the tree is reached.
I implemented this by calling list_objects_v2 function recursively with different prefixes in boto3 and while it does work it is very slow and for buckets with alot of folders the lambda is exceeding the timeout of 15 minutes.
I wanted to know if there is a more efficient way of doing this.
Update: Sample output, this is what I'm getting right now by calling list_objects_v2 recursively.
L1/
L1/hist/
L1/hist/2022-01-03
L1/hist/2022-01-01
...
CodePudding user response:
You can enumerate through all of the objects in the bucket, and find the "folder" (really the prefix up until the last delimiter), and build up a list of available folders:
seen = set()
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket='bucket-name'):
for obj in page.get('Contents', []):
key = obj['Key']
folder = key[:key.rindex("/")] if '/' in key else ""
if folder not in seen:
seen.add(folder)
print(folder)
CodePudding user response:
The list_objects_v2() call returns a list of all objects. The Key of each object includes the full path of the object.
Therefore, you can simply extract the paths from the Keys of all objects:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='my-bucket')
# folder1/folder2/foo.txt --> folder1/folder2
paths = {object['Key'][:object['Key'].rfind('/')] for object in response['Contents'] if '/' in object['Key']}
for path in sorted(paths):
print(path)
If your bucket contains more than 1000 objects, then you will either need to loop through the results using ContinuationToken or use a paginator. See: list_objects_v2 paginator
