I have several JSON files containing multiple dictionaries stored in S3. I need to access each line and rename some of the keys. I have written the code in my local environment which works flawlessly, but I run into issues using Lambda. Usually, I get an Expecting property name enclosed in double quotes error.
Example JSON:
{
"request": 123,
"key1": [
{
"timestamp_unix": 98321,
"key_2": "Portugal"
}
]
}
{
"request": 456,
"key1": [
{
"timestamp_unix": 35765,
"key_2": "China"
}
]
}
Local code:
import json
with open("myfile.json", "r") as f:
my_file = [json.loads(line) for line in f]
for j in my_file:
j[key1][0][key2] = j[key1][0].pop("key_2")
AWS code:
import boto3
import json
s3 = boto3.resource("s3")
obj = s3.Object("my-bucket", "path_to/myfile.json")
json_string = obj.get()["Body"].read().decode("utf-8") # this is where my json object is read in with single quotes instead of double quotes
my_file = [json.loads(line) for line in json_string] # error error error
I also tried:
import boto3
import json
s3_client = boto3.client("s3")
obj = s3_client.get_object(Bucket="my-bucket", Key="path_to/myfile.json")
json_string = obj["Body"].read().decode() # this is where my json object is read in with single quotes instead of double quotes
my_file = [json.loads(line) for line in json_string] # error error error
I removed the encode() option altogether, but this didn't work either. I don't want to/can't change the underlying json files and store the dicts in a list.
How can I read in json files with multiple dictionaries with boto3?
CodePudding user response:
The boto3 equivalent of for line in f: is to use the iter_lines() method.
lines = obj.get()["Body"]
my_file = [json.loads(line) for line in lines.iter_lines()]
