How to create a txt file with a list of directory names if directories have a certain file-CodePudding

I have a parent directory with over 800 directories, each of these has a unique name. Some of these directories house a sub-directory called y in which a file called z, (if it exists) can be found.

I need to script a loop that will check each of the 800 for z, and if it's there, I need to append the name of the directory (the directory before y) into a text file. I'm not sure how to do this.

This is what I have

#!/bin/bash

for d in *; do
    if [ -d "y"]; then
        for f in *; do
            if [ -f "x"]
                echo $d >> IDlist.txt
            fi
    fi 
done

CodePudding user response：

Let's assume that any foo/y/z is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:

declare -a names=(*/y/z)
printf '%s\n' "${names[@]%%/*}" > IDlist.txt

Explanation: the paths of all z files are first stored in array names, thanks to a glob pattern: */y/z. Then, a pattern substitution is applied to each array element to suppress the /y/z part: "${names[@]%%/*}". The result is printed, one name per line: printf '%s\n'.

If you also had directories named z, or if you had millions of files, find could be used, instead, with a bit of awk to retain only the leading directory name:

find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
  awk -F/ '{print $2}' > IDlist.txt

If you prefer sed for the post-processing:

find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
  sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt

These two are probably also more efficient (faster).

Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:

#!/bin/bash

for d in *; do
    if [ -d "$d/y" ]; then
        for f in "$d"/y/*; do
            if [ "$f" = "$d/y/z" ]; then
                printf '%s\n' "$d" >> IDlist.txt
            fi
        done
    fi
done

As noted by @LéaGris, printf is better than echo because if d is the -e string, for instance, echo "$d" interprets it as an option of the echo command and does not print it.

But a simpler and more efficient version (even if not as efficient as the first proposal or the find-based ones) would be:

#!/bin/bash

for d in *; do
    if [ -f "$d/y/z" ]; then
        printf '%s\n' "$d" 
    fi
done > IDlist.txt

As you can see there is another improvement (also suggested by @LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt file. This will open and close the file only once, instead of once per iteration.

CodePudding user response：

This should solve it:

for f in */y/z; do
    [ -f "$f" ] && echo ${f%%/*}
done

Note: If there is a possibility of weird top level directory name like "-e", use printf instead of echo, as in the comment below.

CodePudding user response：

This should do it:

shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
  [[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done

The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo ensure that the directory name is output correctly even if it contains two successive spaces.

CodePudding user response：

You can first try to do some filtering using find

Below will list all z files recursively within current directory

Then let's say the one of the output was

./dir001/y/z

Then you can extract required part using multiple ways grep, sed, awk, etc

e.g. with grep

find . -type f | grep z | grep -E -o "y.*$"

will give

y/z

CodePudding user response：

This solution doesn't check that z is a file but I think it's worth showing compgen:

#!/bin/bash

compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt