I have a json object (json string) which has values like this:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]",
null
],
"stewards": [
"[email protected]",
''
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]
Bu the final format I want is something that has removed the nulls and the empty list items: something like this:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]"
],
"stewards": [
"[email protected]"
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
I want the output to exclude nulls, empty strings and make it look more clean. I need to do this recursively for all the lists in all the jsons I have.
Even more than recursive, it would be helpful if I can do it at one stretch rather than looping through each element.
I need to clean only the lists though.
Can anyone please help me with this? Thanks in advance
CodePudding user response:
import json
def recursive_dict_clean(d):
for k, v in d.items():
if isinstance(v, list):
v[:] = [i for i in v if i]
if isinstance(v, dict):
recursive_dict_lookup(v)
data = json.loads("""[{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]",
null
],
"stewards": [
"[email protected]"
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}]""")
for d in data:
recursive_dict_clean(d)
print(data):
[{'id': 1,
'object_k_id': '',
'object_type': 'report',
'object_meta': {'source_id': 0, 'report': 'Customers'},
'description': 'Daily metrics for all customers',
'business_name': '',
'business_logic': '',
'owners': ['[email protected]'],
'stewards': ['[email protected]'],
'verified_use_cases': ['c4a48296-fd92-3606-bf84-99aacdf22a20'],
'classifications': [],
'domains': []}]
P.S.: Your json string is not valid.
CodePudding user response:
You can convert your json to dict then use the function below and convert it to json again:
def clean_dict(input_dict):
output = {}
for key, value in input_dict.items():
if isinstance(value, dict):
output[key] = clean_dict(value)
elif isinstance(value, list):
output[key] = []
for item in value:
if isinstance(value, dict):
output[key].append(clean_dict(item))
elif value not in [None, '']:
output[key].append(item)
else:
output[key] = value
return output
Thanks to N.O
CodePudding user response:
You can use the inbuilt object_pairs_hook to parse the data as you decode it from your string.
https://docs.python.org/3/library/json.html#json.load
This function runs ever time the decoder might call dict() and removes all None objects from lists as it goes using a simple list comprehension, otherwise leaving the data alone and letting the decoder do it's thing.
#!/usr/bin/env python3
import json
data_string = """[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]",
null
],
"stewards": [
"[email protected]",
""
],
"verified_use_cases": [
null,
null,
"c4a48296-fd92-3606-bf84-99aacdf22a20",
null
],
"classifications": [
null
],
"domains": []
}
]"""
def json_hook(obj):
return_obj = {}
for k, v in obj:
if isinstance(v, list):
v = [x for x in v if x is not None]
return_obj[k] = v
return return_obj
data = json.loads(data_string, object_pairs_hook=json_hook)
print(json.dumps(data, indent=4))
Result:
[
{
"id": 1,
"object_k_id": "",
"object_type": "report",
"object_meta": {
"source_id": 0,
"report": "Customers"
},
"description": "Daily metrics for all customers",
"business_name": "",
"business_logic": "",
"owners": [
"[email protected]"
],
"stewards": [
"[email protected]",
""
],
"verified_use_cases": [
"c4a48296-fd92-3606-bf84-99aacdf22a20"
],
"classifications": [],
"domains": []
}
]
in your example you remove the "" value from stewards, if you want that behaviour, you can replace is not None with not in (None, "").. but it seemed like that might've been a mistake since you left empty strings in other places.
