Assume I have a file input.txt with the following contents:
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
I want to create an output.txt which looks like this:
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
The proposed way from here: Copy contents from capture group to another subsequent line :
awk '/[0-9]{4}./ {match($0,"([0-9]{4})",n);}{gsub(/replace-with-code/,n[0]); print}' inputfile > outputfile
Returns an error. I am just not able to fix this issue... Any awk magicians can help me here?
CodePudding user response:
Here is an awk solution:
awk '{gsub(/replace-with-code/, p)} /[0-9]{4}\.$/ {p = $NF 0} 1' file
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
CodePudding user response:
In case you are not worried about blank lines in between actual data lines then try following GNU awk program. Basically its using GNU awk's match function in which we could use regex((^|\n)([^:]*: )([0-9] )(\.\n[^\n] \n[^:]*: )) and can catch values as per our requirement and while printing them we can print them as per required order.
awk -v RS= '
match($0,/(^|\n)([^:]*: )([0-9] )(\.\n[^\n] \n[^:]*: )/,arr){
print arr[1] arr[2] arr[3] arr[4] arr[3]
}
' Input_file
Explanation: Adding detailed explanation for used regex in above awk program.
(^|\n) ##In first capturing group using regex ^|\n
([^:]*: ) ##In next one matching everything till colon space first occurrence.
([0-9] ) ##In this capturing group matching 1 or more digits.
(\.\n[^\n] \n[^:]*: ) ##In 4th capturing group matching literal dot followed by new line followed by
##non-new lines followed by new line till very first occurrence of colon followed by space.
CodePudding user response:
I would exploit GNU AWK paragraph mode for this task following way, let file.txt content be
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
then
awk BEGIN{RS="";ORS="\n\n"}{print gensub(/([0-9] )(.*)replace-with-code/, "\\1\\2\\1", 1)}' file.txt
gives output
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
Explanation: setting RS to empty string provokes GNU AWK that rows are separated by blank lines rather than newlines. I set ORS to preserve such lines. Then for every row I use gensub function, to capture number which I simply define as 1 or more digits and what is between said number and replace-with-code, last I do not capture as it will be not kept. Then replace with what was captured in desired output. Disclaimer: this solution that you never has more than 1 subsequent blank lines.
(tested in gawk 4.2.1)
CodePudding user response:
skip all the gensub()/gsub(), no match() needed, no arrays needed, no capture groups needed … and just make it a basic if-then-else :
echo "${input_data….}" | mawk 'NF<=!__ || $NF =_=/replace-with-code$/ ? _ : $NF' FS=': ' OFS=': '
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
