I want to remove all text after certian format.
<JOB APPLICATION="Daily" SUB_APPLICATION="Y#D5#4#M2F" JOBNAME="MLETTXXD-NONR_005" DESCRIPTION="" CREATED_BY="vpma" RUN_AS="ctmagt" CRITICAL="0" TASKTYPE="Dummy" NODEID="OPENFRAME" %%ENVIRONMENT MLETTXXD %%ORDERID %%RUNCOUNT %%JCL_STEP" CONFIRM="0" RETRO="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" TIMETO=">" JAN="1" FEB="1" MAR="1"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
Delete all string before and after JOBNAME="..."
Output should be
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
I tried below but not happening for second awk condition.
awk '/JOBNAME=/{print $4} | /INCOND/{print $2}' inputfile.txt
CodePudding user response:
Using sed
$ sed s'/.*\(JOBNAME[^ ]*\).*/\1/' input_file
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
CodePudding user response:
One simple fix to OP's current awk code:
$ awk '/JOBNAME=/{$0=$4}1' inputfile.txt
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
NOTES:
$0=$4says to replace the current line with the contents of the 4th field- assumes OP's
/INCOND/pattern match is an attempt to print the rest of the lines of input hence ... - the standalone
1says to print the current line
This has a few limitations:
- assumes the
JOBNAME="..."string is always in the 4th space-delimited field of a line - does not take into consideration multiple instances of the string in a single line
- assumes the string does not contain any white space
Addressing the limitations ...
First we'll add a new line to the input:
$ cat inputfile.txt
<JOB APPLICATION="Daily" SUB_APPLICATION="Y#D5#4#M2F" JOBNAME="MLETTXXD-NONR_005" DESCRIPTION="" CREATED_BY="vpma" RUN_AS="ctmagt" CRITICAL="0" TASKTYPE="Dummy" NODEID="OPENFRAME" %%ENVIRONMENT MLETTXXD %%ORDERID %%RUNCOUNT %%JCL_STEP" CONFIRM="0" RETRO="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" TIMETO=">" JAN="1" FEB="1" MAR="1"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
<JOB APPLICATION="Daily" JOBNAME="JOBNAME # 1" DESCRIPTION="" JOBNAME="Another Job" CREATED_BY="vpma"
A GNU awk idea:
awk '
BEGIN { FPAT="\\<JOBNAME=\"[^\"]*\"" } # define field pattern as JOBNAME="..."
NF { pfx="" # if we have a FPAT match then NF>0
for (i=1;i<=NF;i ) { # loop through our FPAT matches
printf "%s%s",pfx,$i # print each FPAT match to stdout
pfx=OFS
}
print "" # terminate the line of FPAT matches
next # go to next line of input
}
1 # print all lines that do not have a FPAT match
' inputfile.txt
NOTE:
GNU awkis needed forFPATsupport (this allows us to define the format of the field; this replaces the use ofFSwhich defines the format of the field delimiter)- standalone
1assumes OP wants to print all other lines of input that don't have a match to the stringJOBNAME="..."(otherwise OP should update the sample input to contain lines that should not be printed)
This generates:
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
JOBNAME="JOBNAME # 1" JOBNAME="Another Job"
CodePudding user response:
Use this Perl one-liner:
perl -pe 's{ .* ( JOBNAME="[^"]*" ) .* }{$1}x;' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
The regex uses these modifiers:
/x : Ignore whitespace and comments, for readability.
s{ .* ( JOBNAME="[^"]*" ) .* }{$1}; : Replace this pattern: .* - any character repeater 0 or more times, followed by JOBNAME="[^"]*", which has [^"]* - any character except ", repeated 0 or more times, followed by .*. Replace this pattern with $1: the first capture group, that is whatever was matched inside the parentheses.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
