I have two lists list1 and list2 with a filename on each line. I want a result with all filenames that are only in list2 and not in list1, regardless of specific file extensions (but not all). Using Linux bash, any commands that do not require any extra installations. In the example lists, I do know all file extensions that I wish to ignore. I made an attempt but it does not work at all, I don't know how to fix it. Apologies for my inexperience.
I wish to ignore the following extensions: .x .xy .yx .y .jpg
list1.txt
text.x
example.xy
file.yx
data.y
edit
edit.jpg
list2.txt
text
rainbow.z
file
data.y
sunshine
edit.test.jpg
edit.random
result.txt
rainbow.z
sunshine
edit.test.jpg
edit.random
My try:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Edit: I forgot two requirements. The filenames can have . in them and not all filenames must have an extension. I know the extensions that must be ignored. I ammended the lists accordingly.
CodePudding user response:
An awk solution might be more efficient for this task:
awk '
{ f=$0; sub(/\.(xy?|yx?|jpg)$/,"",f) }
NR==FNR { a[f]; next }
!(f in a)
' list1.txt list2.txt > result.txt
CodePudding user response:
comm can do precisely this.
You can preprocess the input:
- strip the suffices
- sort (
commexpects sorted input) - remove duplicates
ss()( sed 's/\.\(x\|xy\|yx\|y\|jpg\)$//' "$@" | sort -u )
comm -13 <(ss list1.txt) <(ss list2.txt) >result.txt
Your code was:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Some issues that immediately jump out:
- syntax error -
then/fibut no matchingif - you never access
list1 - you don't quote variables when you use them, so whitespace and special characters will cause problems
while read ... sed ... sed ... sed ...is inefficient - multiple invocations of sed instead of just one, and a loop that sed would perform implicitlysedexpects file arguments not stringssed -iwill try to overwrite input file arguments- you use
result.txtas both input and output to sed but never assign any contents to it - you try to use data (
$line) as sed commands, instead of applying sed commands to that data - because you used single-quotes,
sed -i -e '$line'will attempt to run a (non-existent) sed commandlineon the last line of input ($) goption tos///does nothing when search is anchored
CodePudding user response:
I'd use join:
$ join -t. -j1 -v2 -o 2.1,2.2 <(sort list1.txt) <(sort list2.txt) | sed 's/\.$//'
rainbow.z
sunshine
(The bit of sed is needed to turn sunshine. into sunshine)
