One procedure in a system is to 'extract' one key and its (object) value to a dedicated file to subsequently process it in some way in a (irrelevant) script.
A representative subset of the original JSON file looks like:
{
"version" : null,
"produced" : "2021-01-01T00:00:00 0000",
"other": "content here",
"items" : [
{
"code" : "AA",
"name" : "Example 1",
"prices" : [ "other", "content", "here" ]
},
{
"code" : "BB",
"name" : "Example 2",
"prices" : [ "other", "content", "here" ]
}
]
}
And the current output, given that subset as input, simply equals:
[
{
"code" : "AA",
"name" : "Example 1",
"prices" : [ "other", "content", "here" ],
},
{
"code" : "BB",
"name" : "Example 2",
"prices" : [ "other", "content", "here" ],
},
...
]
Previously, we would extract the whole partion of "items" using jq with a very straightforward command (which worked fine):
cat file.json | jq '.items' > file.items.json
However, recently the size of the original json file has increased drastically in size, causing the script to fail due to a Out of memory error. One obvious solution is to use jq's 'stream' option. However, I am kind of stuck on how to convert above command to a valid filter in jq's stream syntax.
cat file.json | jq --stream '...' > file.items.json
Any advice on what to use as a filter for this command would be greatly appreciated. Thanks in advance!
CodePudding user response:
You should use the --stream flag in combination with the fromstream builtin
jq --stream --null-input '
fromstream(inputs | select(.[0][0] == "items"))[]
' file.json
[
{
"code": "AA",
"name": "Example 1",
"prices": [
"other",
"content",
"here"
]
},
{
"code": "BB",
"name": "Example 2",
"prices": [
"other",
"content",
"here"
]
}
]
Demo not for the efficiency or memory consumption but rather for the syntax (as I had to stream your original input using tostream for the lack of the --stream option on jqplay.org)
Note: Although it works for the sample data, do not try to shortcut using
jq --stream --null-input 'fromstream(inputs).items' file.json
directly on your large JSON file, as it only
reconstructs the entire input JSON entity, thus defeating the purpose of using
--stream
(clarified by @peak)
CodePudding user response:
If a stream of the {code, name, prices} objects is acceptable, then you could go with:
< input.json jq --stream -n '
fromstream( 2 | truncate_stream(inputs | select(.[0][0] == "items")) )'
This would have minimal memory requirements, which may or may not be significant depending on the value of .items|length
