I have a multiline string containing some text followed by a JSON, so it has the following format:
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
I want to extract the JSON using sed by removing the text before, so everything until (and including) MY_JSON: (note the trailing space).
My current solution:
# $str contains above multiline string
$ echo $str | sed '/MY_JSON: /d'
I get the following output:
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
But I want the following output:
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
So the idea is to select everything until the first occurrence of { and delete it. But that doesn't work. It doesn't delete the first n lines until the line where the pattern matches. It also deletes the whole line instead of just the part until the {.
How can I achive best with sed what I want to do?
CodePudding user response:
You may use this sed:
sed '1,/MY_JSON:/ {/MY_JSON:/!d; s/^MY_JSON: *//;}' file
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Command Details:
1,/MY_JSON:/: Match from line 1 to the line that matchesMY_JSON:{/MY_JSON:/!d; s/^MY_JSON: *//;}: Delete all lines except last one and then removeMY_JSON:from that line.
CodePudding user response:
Using sed
$ sed 's/^[a-zA-Z][^{]*//;/^$/d' input_file
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
CodePudding user response:
If file has only one json structure Input
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/p;}' -n
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
If file has multiple json structures Input
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
some
My: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
sed '/^[^{]*{/,/^}/!d;s/^[^{]*{/{/g'
OR
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/;p}' -n | sed -n '/^[^{]*{/,/^}/{;p}' | sed 's/^[^{]*{/{/g'
In above command remove anything after ; to retain MY_JSON like titles
Output
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
