Extract the unique substrings at a given position range in a set of filenames-CodePudding

I have a set of filenames (all of the same length) in a given dir and would like to find all the distinct substrings at a given position range in the filename (there will be many filenames with the same substring).

Specifically the substring I am interested starts at position 7 of the filename and goes 10 characters long

for file in *; do
      if [ -d "$file" ]; then
      file_basename=`basename $file`
      substr=${file_basename:7:10}
  done

I would like to write those unique substrings to either a file or a data structure that I can then loop through.

So the set of filenames

........12s456tyer..........
........12s436tyer..........
........12s456tyer..........
........12s436tyer..........

would lead to the 2 strings

12s456tyer
12s436tyer

CodePudding user response：

If you use bash 4.0 or newer, you can create a unique associative array:

declare -A distinct_substrings=()
shopt -s nullglob # Prevent '*' from expanding to a literal '*'

for file in *; do
    if [[ -f $file ]]; then
        file_basename=${file##*/} # Not necessary if files are expanded from current dir.
        substr=${file_basename:7:10}
        distinct_substrings[$substr]=$substr
    fi
done

# Do stuff with "${distinct_substrings[@]}"

CodePudding user response：

You would do it without looping:

printf '%s\n' * | cut -c 9-18 | sort -u >outputfile.txt

To do it only on files and exclude directories (requires bash):

#!/usr/bin/env bash
shopt -s extglob
printf %s\\n *!(/) | cut -c 9-18 | sort -u >outputfile.txt