Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker save - how to make it reproducible? #49068

Closed
ptoman-cisco opened this issue Dec 11, 2024 · 3 comments · Fixed by #48611
Closed

docker save - how to make it reproducible? #49068

ptoman-cisco opened this issue Dec 11, 2024 · 3 comments · Fixed by #48611

Comments

@ptoman-cisco
Copy link

The output of docker save doesn't seem to be reproducible in some cases.

On my Mac:

$ sw_vers
ProductName:		macOS
ProductVersion:		14.7.1
BuildVersion:		23H222


$ docker info
Client:
 Version:    27.0.3
 Context:    desktop-linux
...
Server:
...
 Server Version: 27.0.3
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
...
 Kernel Version: 6.6.32-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
...

$ docker pull alpine:latest
latest: Pulling from library/alpine
...

$ docker save alpine:latest >alpine_latest1.tar
$ docker save alpine:latest >alpine_latest2.tar
$ diff alpine_latest1.tar alpine_latest2.tar
$ echo $?
0
# (i.e. no diff)

However, when doing the same on a RHEL9 VM host, I get a different output file each time:

$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.5 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.5"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.5 (Plow)"
...
$ docker info
Client: Docker Engine - Community
 Version:    27.3.1
 Context:    default
...
Server:
...
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: syslog
 Cgroup Driver: systemd
 Cgroup Version: 2
...
 Kernel Version: 5.14.0-503.15.1.el9_5.x86_64
 Operating System: Red Hat Enterprise Linux 9.5 (Plow)
 OSType: linux
 Architecture: x86_64
...

$ docker save alpine:latest >alpine_latest1.tar
$ docker save alpine:latest >alpine_latest2.tar
$ diff alpine_latest1.tar alpine_latest2.tar
Binary files alpine_latest1.tar and alpine_latest2.tar differ

I remember testing the same thing on the same VM earlier this year (when on RHEL 9.2/9.3 i think, and some earlier Docker version) and there was no difference in docker save output. What's causing this difference? Is there a way to make the output reproducible?

@tonistiigi tonistiigi transferred this issue from moby/buildkit Dec 11, 2024
@thaJeztah
Copy link
Member

The difference between those setups is that your Docker Desktop engine is using the containerd image-store;

 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1

The containerd image-store is not yet the default on Linux installations, but on Docker Desktop it's enabled when doing a factory reset (or fresh install) and no images are present.

The non-containerd image store is optimised for disk space; it only preserves images in their uncompressed (unpacked) form. When pulling an image from a registry, the compressed layers are extracted and discarded (but information about them is preserved). The distribution (compressed) format of those images is constructed when pushing (or exporting/saving) the image. Part of the information used to construct the distributable format contains timestamps, which causes the digest to differ. The other part is coompression, which is not reproducible (or not guaranteed to be reproducible).

The containerd image-store preserves images both in the "distribution" format (OCI image with compressed layers), and the "unpacked" (extracted) form. Doing so preserves the digest of images that were pulled from a registry, and (for image built locally) performs the compression once, but at the cost of more storage used for storing images (they're stored twice; once in the distributable, compressed, format, and once extracted).

You can configure the daemon to use the containerd image store, but if possible, I recommend doing so from a clean state (no images, containers present) because both stores use a different location for storing the data, and switching stores does not remove the data from the other store (so you may end up having data on disk that's not accessible while using the other store);

more information in the documentation;

@ptoman-cisco
Copy link
Author

@thaJeztah thank you for the explanation!

Apparently, something must have changed in docker binaries in the last year or so -- we've been using the same setup (as code, in Ansible) for last few years (dockerd service with --iptables=false and --storage-driver=overlay2 params), at least when running on RHEL.

From the documentation it seems that overlay2/fs still seems to be recommended, and that containerd is still 'experimental'. Is that correct understanding?

@thaJeztah
Copy link
Member

Thanks for the extra context Hmm.. so I wonder if that's the same issue as is reported in this ticket (wrong repository as it's not an issue on the CLI itself, but GitHub doesn't allow moving tickets between orgs 😅);

There's a pending PR for that issue, but I'd have to check up if it's already complete;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants