The following is the disk usage of my machine. I ran docker system df -v
CONTAINER ID IMAGE COMMAND LOCAL VOLUMES SIZE CREATED STATUS
a7ea593ae05e c265fa50d720 "bash" 0 20.7GB 21 hours ago Up 21 hours
This reports the correct amount of storage used by the container. This container created a 19GB file.
However,
df -h shows the following disk space consumption
Filesystem Size Used Avail Use% Mounted on
overlay 97G 72G 26G 75% /var/lib/docker/overlay2/aa15e95ad170e94462a8a2064b9f1bad62d3c2ea579f4e7ffcb16e159fbc1cea/merged
On checking this folder. I found out that /var/lib/docker/overlay2/9a58de48986643e6284123effe601e57099133ca55e471b548d059906b20434b
folder was consuming 42 GB.
On checking the disk usage statistics for this folder
20G /var/lib/docker/overlay2/9a58de48986643e6284123effe601e57099133ca55e471b548d059906b20434b/diff
23G /var/lib/docker/overlay2/9a58de48986643e6284123effe601e57099133ca55e471b548d059906b20434b/merged
42G /var/lib/docker/overlay2/9a58de48986643e6284123effe601e57099133ca55e471b548d059906b20434b
On further checking I could find the 20 GB file in both merged and diff directory. This explains why it was consuming double the amount of storage.
I never run a docker diff on a production instance. So this is causing me more harm than benfit. I wanted to know if there is a possibility to disable docker running diff for some specific folders so that redundant copy of the file the instance creates is not stored?
Please find below the steps to create a minimal reproducible example
Docker file
FROM ubuntu:18.04
RUN apt update -y
RUN apt upgrade -y
RUN apt-get install -y python3 python3-dev
RUN apt-get install -y python3-pip jq
RUN mkdir -p /opt/program/
WORKDIR "/opt/program/"
ADD ./ /opt/program/
CMD ["bash", "-c", "python3 -u program.py"]
program.py
import logging
import time
import argparse
import random
logging.basicConfig(level=logging.DEBUG)
if __name__ == "__main__":
# parser = argparse.ArgumentParser()
# parser.add_argument('--test')
# args = parser.parse_args()
l = ['test_' str(i) for i in range(1000000)]
with open('/tmp/test_abc123', 'w') as f:
f.write('\n'.join(l))
while 1:
time.sleep(1)
logging.info("%s", 'test')
I executed the following commands
1. sudo docker build -f Dockerfile .
2. sudo docker run -d <image_id>
3. sudo docker container exec -it <container_id> bash
4. Verified a 11 MB file is created in /tmp in the container.
5. Now, I search for the file in my host machine
satinders:/var/lib/docker/overlay2# ls -l $(find . -iname test_abc123)
-rw-r--r-- 1 root root 11888889 Jan 11 18:03 ./edf8615137c5b04e14116f6285757d4db170187282a5b51d78f4f77ca4c9b707/diff/tmp/test_abc123
-rw-r--r-- 1 root root 11888889 Jan 11 18:03 ./edf8615137c5b04e14116f6285757d4db170187282a5b51d78f4f77ca4c9b707/merged/tmp/test_abc123
Update
I found this in the documentation for docker diff here
List the changed files and directories in a container᾿s filesystem since the container was created. Three different types of change are tracked:
A A file or directory was added
D A file or directory was deleted
C A file or directory was changed
CodePudding user response:
merged is an overlay filesystem, which is a combination of filesystem layers, and takes up no actual disk space itself (similar to the behavior of a bind mount). The diff directory is the changes created in that layer, or the containers read write layer (for the merged directory, I believe it's tracked as the upper directory). So either the file was created or modified in that layer, and is stored once in that layer.
If the file is also in the lower layers (e.g. in both the image and container specific layers), then you have likely modified the file, triggering a copy on write. That change can include a permission or ownership change on the file.
To disable the container diff and read-write layer, you can define your container with the --read-only flag. You will not be able to create files in the container that aren't in some other mount (a volume or tmpfs).
