I have one file suffix.txt which contains some strings linewise, for example-
ing
ness
es
ed
tion
Also, I have a text file text.txt which contains some text,
it is given that text.txt consists only of lowercase letters and without any punctuation, for example-
the raining cloud answered the man all his interrogation and with all
questioned mind the princess responded
harness all goodness without getting irritated
I want to remove the suffixes from the original words in text.txt only once for every suffix. Thus I expect the following output-
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
Note that tion was not removed from questioned since the original word didn't contain tion as a suffix. It would be really helpful if someone could answer this with sed commands.
I was using a naive script that doesn't seem to do the job-
#!/bin/bash
while read p; do
sed -i "s/$p / /g" text.txt;
sed -i "s/$p$//g" text.txt;
done <suffix.txt
CodePudding user response:
An awk:
$ awk '
NR==FNR { # generate a regex of suffices
s=s (s==""?"(":"|") $0 # (ing|ness|es|ed|tion)$
next
}
FNR==1 {
s=s ")$" # well, above )$ is inserted here
}
{
for(i=1;i<=NF;i ) # iterate all the words and
sub(s,"",$i) # apply regex to each of them
}1' suffix text # output
Output:
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
CodePudding user response:
Kinda hairy but sed and unix tools only:
sed -E -f <(tr '\n' '|' <suffix.txt | sed 's/\|$//; s/\|/\\\\b|/g; s/$/\\\\b/' | xargs printf 's/%s//g') text.txt
The
tr '\n' '|' <suffix.txt | sed 's/\|$//; s/\|/\\\\b|/g; s/$/\\\\b/' | xargs printf 's/%s//g'
generates the substitution script of
s/ing\b|ness\b|es\b|ed\b|tion\b//g
This requires GNU sed for \b.
It would be easier with perl, ruby, awk, etc
Here is a GNU awk:
gawk -i join 'FNR==NR {arr[FNR]=$1; next}
FNR==1{re=join(arr,1,length(arr),"\\>|"); re=re "\\>"}
{gsub(re,"")}
1
' suffix.txt text.txt
Both produce:
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
CodePudding user response:
This might work for you (GNU sed):
sed -z 'y/\n/|/;s/|$//;s#.*#s/\\B(&)\\b//g#' suffixFile | sed -Ef - textFile
Convert suffixFile into sed commands in a file and pass that via a pipe to a second invocation of sed that amends the textFile.
N.B. The sed command use the \B and \b to match a suffix.
CodePudding user response:
You can try this sed approach.
You will first need to create an array from suffix.txt
suffix=($(cat suffix.txt))
You can then use it for ubstitution within the main sed code.
sed " s/${suffix[0]}//;s/${suffix[1]}//g;/question/! {s/${suffix[2]//};s/${suffix[3]}//g;/question/! {s/${suffix[4]}//}" text.txt
Output
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
