Problem
I am trying to archive and compress some directories (and their contents) on a GNU/Linux machine and have the original directories (and their contents) removed afterwards.
Minimum reproducable example
Here some code to recreate the situation on a GNU/Linux machine:
cd /tmp
mkdir find_and_tar
cd find_and_tar
mkdir files
mkdir texts
touch files/file1 files/file2 files/file3
touch texts/text1
tree should now give the following:
.
├── files
│ ├── file1
│ ├── file2
│ └── file3
└── texts
└── text1
What I've tried so far
Now, my command to achieve the stated goal thus far is:
find . -mindepth 1 -type d -exec tar --remove-files -cJf {}.tar.xz {} \;
It does what it is supposed to do - tree now gives:
.
├── files.tar.xz
└── texts.tar.xz
BUT the command throws the following warnings:
find: ‘./texts’: No such file or directory
find: ‘./files’: No such file or directory
If I were to remove the --remove-files modifier, the warnings disappear but obviously the original dirs stay around.
Question(s)
- Why do these
findwarnings appear? - How do I avoid them?
Version info
$ tar --version
tar (GNU tar) 1.30
$ find --version
find (GNU findutils) 4.6.0.225-235f
CodePudding user response:
Your problem is that find is still processing the files in the tree during the time that the tar is running.
When you only need to process directories at the top level, your -maxdepth 1 will work. Two alternatives:
Use find option depth for looking in the subdirs first
This might be useful when you need to find directories in different levels:
find . -mindepth 1 -depth -type d -exec tar --remove-files -cJf {}.tar.xz {} \;
Avoid find
for d in */; do
tar --remove-files -cJf "${d%/}".tar.xz "${d%/}"
done
CodePudding user response:
Analysis
Adding the -D all option to the find call to get more insight into what's going on, gave this (amongst others):
Optimized command line:
( -mindepth 1 [est success rate 1] [real success rate 0/0=_] -a [est success rate 0.0922] [real success rate 0/0=_] [need type] -type d [est success rate 0.0922] [real success rate 0/0=_] ) -a [est success rate 0.0922] [real success rate 0/0=_] -exec tar [est success rate 1] [real success rate 0/0=_]
consider_visiting (early): ‘.’: fts_info=FTS_D , fts_level= 0, prev_depth=-2147483648 fts_path=‘.’, fts_accpath=‘.’
consider_visiting (late): ‘.’: fts_info=FTS_D , isdir=1 ignore=1 have_stat=1 have_type=1
consider_visiting (early): ‘./texts’: fts_info=FTS_D , fts_level= 1, prev_depth=0 fts_path=‘./texts’, fts_accpath=‘texts’
consider_visiting (late): ‘./texts’: fts_info=FTS_D , isdir=1 ignore=0 have_stat=1 have_type=1
consider_visiting (early): ‘./texts’: fts_info=FTS_DNR, fts_level= 1, prev_depth=1 fts_path=‘./texts’, fts_accpath=‘texts’
find: ‘./texts’: No such file or directory
consider_visiting (early): ‘./files’: fts_info=FTS_D , fts_level= 1, prev_depth=1 fts_path=‘./files’, fts_accpath=‘files’
consider_visiting (late): ‘./files’: fts_info=FTS_D , isdir=1 ignore=0 have_stat=1 have_type=1
consider_visiting (early): ‘./files’: fts_info=FTS_DNR, fts_level= 1, prev_depth=1 fts_path=‘./files’, fts_accpath=‘files’
find: ‘./files’: No such file or directory
consider_visiting (early): ‘.’: fts_info=FTS_DP, fts_level= 0, prev_depth=1 fts_path=‘.’, fts_accpath=‘.’
consider_visiting (late): ‘.’: fts_info=FTS_DP, isdir=1 ignore=1 have_stat=1 have_type=1
Please note that the second visit in e.g. ./texts is only different to the previous visit in the prev_depth value. I figured that find tries to recurse the whole directory tree even after tar has already removed the top level directory.
One can see the impact of this recursion by slightly adapting the scenario:
- adding a
subfilesdir tofiles - and running the
findcall without--remove-filesand-maxdepth
This will lead to the subfiles dir being archived separately within the unarchived files dir.
Solution
Since I know that I want to archive the top level directories, adding the -maxdepth 1 option to my find call solved the problem for me.
