Home > OS >  sed multiline delete everything before first occurrence of pattern
sed multiline delete everything before first occurrence of pattern

Time:01-29

I have a multiline string containing some text followed by a JSON, so it has the following format:

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

I want to extract the JSON using sed by removing the text before, so everything until (and including) MY_JSON: (note the trailing space).

My current solution:

# $str contains above multiline string
$ echo $str | sed '/MY_JSON: /d'

I get the following output:

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

But I want the following output:

{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

So the idea is to select everything until the first occurrence of { and delete it. But that doesn't work. It doesn't delete the first n lines until the line where the pattern matches. It also deletes the whole line instead of just the part until the {.

How can I achive best with sed what I want to do?

CodePudding user response:

You may use this sed:

sed '1,/MY_JSON:/ {/MY_JSON:/!d; s/^MY_JSON: *//;}' file

{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Command Details:

  • 1,/MY_JSON:/: Match from line 1 to the line that matches MY_JSON:
  • {/MY_JSON:/!d; s/^MY_JSON: *//;}: Delete all lines except last one and then remove MY_JSON: from that line.

CodePudding user response:

Using sed

$ sed 's/^[a-zA-Z][^{]*//;/^$/d' input_file
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

CodePudding user response:

If file has only one json structure Input

It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/p;}' -n
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

If file has multiple json structures Input

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
some
My: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
sed '/^[^{]*{/,/^}/!d;s/^[^{]*{/{/g'

OR

sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/;p}' -n | sed -n '/^[^{]*{/,/^}/{;p}' | sed 's/^[^{]*{/{/g'

In above command remove anything after ; to retain MY_JSON like titles

Output

{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
  •  Tags:  
  • Related