I have a parent directory with over 800 directories, each of these has a unique name. Some of these directories house a sub-directory called y in which a file called z, (if it exists) can be found.
I need to script a loop that will check each of the 800 for z, and if it's there, I need to append the name of the directory (the directory before y) into a text file. I'm not sure how to do this.
This is what I have
#!/bin/bash
for d in *; do
if [ -d "y"]; then
for f in *; do
if [ -f "x"]
echo $d >> IDlist.txt
fi
fi
done
CodePudding user response:
Let's assume that any foo/y/z is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:
declare -a names=(*/y/z)
printf '%s\n' "${names[@]%%/*}" > IDlist.txt
Explanation: the paths of all z files are first stored in array names, thanks to a glob pattern: */y/z. Then, a pattern substitution is applied to each array element to suppress the /y/z part: "${names[@]%%/*}". The result is printed, one name per line: printf '%s\n'.
If you also had directories named z, or if you had millions of files, find could be used, instead, with a bit of awk to retain only the leading directory name:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
awk -F/ '{print $2}' > IDlist.txt
If you prefer sed for the post-processing:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt
These two are probably also more efficient (faster).
Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:
#!/bin/bash
for d in *; do
if [ -d "$d/y" ]; then
for f in "$d"/y/*; do
if [ "$f" = "$d/y/z" ]; then
printf '%s\n' "$d" >> IDlist.txt
fi
done
fi
done
As noted by @LéaGris, printf is better than echo because if d is the -e string, for instance, echo "$d" interprets it as an option of the echo command and does not print it.
But a simpler and more efficient version (even if not as efficient as the first proposal or the find-based ones) would be:
#!/bin/bash
for d in *; do
if [ -f "$d/y/z" ]; then
printf '%s\n' "$d"
fi
done > IDlist.txt
As you can see there is another improvement (also suggested by @LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt file. This will open and close the file only once, instead of once per iteration.
CodePudding user response:
This should solve it:
for f in */y/z; do
[ -f "$f" ] && echo ${f%%/*}
done
Note:
If there is a possibility of weird top level directory name like "-e", use printf instead of echo, as in the comment below.
CodePudding user response:
This should do it:
shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
[[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done
The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo ensure that the directory name is output correctly even if it contains two successive spaces.
CodePudding user response:
You can first try to do some filtering using find
Below will list all z files recursively within current directory
Then let's say the one of the output was
./dir001/y/z
Then you can extract required part using multiple ways grep, sed, awk, etc
e.g. with grep
find . -type f | grep z | grep -E -o "y.*$"
will give
y/z
CodePudding user response:
This solution doesn't check that z is a file but I think it's worth showing compgen:
#!/bin/bash
compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt
