I have a text file structured like this:
[timestamp1] header with space
[timestamp2] data1
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
[timestamp6] footer with space
[timestamp7] junk
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
[timestamp12] footer with space
[timestamp13] junk
[timestamp14] header with space
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
[timestamp19] footer with space
I need to find each part between header and footer and save it in another file. For example the file1 should contain (with or without timestamps; doesn't matter):
data1
data2
data3
..
and the next pack should be saved as file2 and so on. This seems like a routine process, but I haven't find a solution yet.
I have this sed command that finds the first packet.
sed -n "/header/,/footer/{p;/footer/q}" file
But I don't know how to iterate that over the next matches. Maybe I should delete the first match after copying it to another file and repeat the same command
CodePudding user response:
A very naive approach, coded fast, could be improved, but seems to work, in awk:
BEGIN {
i = 0
}
{
if ($0 == "header") {
write = 1
} else if ($0 == "footer") {
write = 0
i = i 1
} else {
if (write == 1) {
print $0 > "file"i
}
}
}
CodePudding user response:
I would harness GNU AWK for this task following way, let file.txt content be
[timestamp1] header with space
[timestamp2] data1
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
[timestamp6] footer with space
[timestamp7] junk
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
[timestamp12] footer with space
[timestamp13] junk
[timestamp14] header with space
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
[timestamp19] footer with space
then
awk '/header/{c =1;p=1;next}/footer/{close("file" c);p=0}p{print $0 > ("file" c)}' file.txt
produces file1 with content
[timestamp1] header with space
[timestamp2] data1
[timestamp3] data2
[timestamp4] data3
[timestamp5] ..
and file2 with content
[timestamp8] header with space
[timestamp9] data4
[timestamp10] data5
[timestamp11] ...
and file3 with content
[timestamp15] data6
[timestamp16] data7
[timestamp17] data8
[timestamp18] ..
Explanation: my code has 3 pattern-action pairs, for line containing header I increase counter c by 1 and set flag p to 1 and go to next line so no other action is undertaken, for line cotaining footer I close file named file followed by current counter number and set flag p to 0. For lines where p is set to true I print current line ($0) to file named file followed by current counter number. If required adjust /header/ and /footer/ to contant solely on lines which are header and footer lines.
(tested in GNU Awk 5.0.1)
