Page MenuHomePhabricator

Generate a dumps-enabled mediawiki image
Closed, ResolvedPublic

Description

Pretty much like we build the -debug image, we will need to create an additional image that installs:

This image's version will follow the same rules as the rest of the mediawiki images and should be built by scap. I don't think it should add a lot of time to the overall time of the scap build.

The easiest way to have the git repo in the image, instead of installing git, is to have a checkout on the deployment hosts and rsync it over like we do for the MediaWiki code.

Event Timeline

Thanks for creating this ticket.

I noticed a few other things that our image will need, based on: modules/snapshot/manifests/dumps/packages.pp

There might be more and I will update the ticket again if I find any.

@Joe you said:

The easiest way to have the git repo in the image, instead of installing git, is to have a checkout on the deployment hosts and rsync it over like we do for the MediaWiki code.

Would it be a big problem to have the git command in the final image?
Could you see any other benefits to getting the dumps code with rsync from a deployment host, rather than as a git checkout from gerrit? Thanks.

You can retrieve a tarball directly from Gerrit using eg:

Direct commit: https://gerrit.wikimedia.org/g/operations/dumps/+archive/0d1f9be3610716a30b97df2ca671cc246c62c8f2.tar.gz
A tag: https://gerrit.wikimedia.org/g/operations/dumps/+archive/refs/tags/mwbzutils_0.0.4.tar.gz
Or a branch (you would probably want to pin on a sha1) https://gerrit.wikimedia.org/g/operations/dumps/+archive/refs/heads/master.tar.gz

But you still need curl.

It is probably more practical to use a multi stage build if that is possible:

FROM docker-registry.wikimedia.org/bullseye:latest AS git-dumps
RUN apt update && apt install git \
  && cd /srv
  && git clone ... && git checkout <whatever tag>

FROM <whatever mediawik image>
COPY --from=git-dumps /srv/dumps /path/to/where/you/want/it

This way git is not in the mediawiki image and no side packages are affected/updated.

The reason to do what I said is that's how we currently build up the mediawiki code, I'm aware there are other ways to do it but I wanted to maintain consistency with the rest of the process :)

Would it be a big problem to have the git command in the final image?
Could you see any other benefits to getting the dumps code with rsync from a deployment host, rather than as a git checkout from gerrit? Thanks.

It's not a big problem, but in general we tend to only add to images what's strictly needed. Is there any other reason why having git in the image would be useful?

Would it be a big problem to have the git command in the final image?
Could you see any other benefits to getting the dumps code with rsync from a deployment host, rather than as a git checkout from gerrit? Thanks.

It's not a big problem, but in general we tend to only add to images what's strictly needed. Is there any other reason why having git in the image would be useful?

You're right, there's no reason why we would need git in the final image.
I was recalling that we use git in some of our other blubber/kokkuri based builds but, now I think about it, they're all multi-stage builds as Antoine had suggested.

I don't really have any strong opinions on how this image is created, so I will stay out of it :-)

I have started looking into it hypothesizing we might want to create a new derivative image for dumps specifically, but then I realized we're just adding about 150 MB in this layer, so it might make sense to actually add the tools needed for dumps to our "debug" image, in order to reduce the amount of time it will take to build images during the release process.

For now, I'm going with the git clone approach with a multi-stage build - which is slightly slower than copying files in otherwise but it's overall ok IMHO. I should have this merged by next week :)

Joe changed the task status from Open to In Progress.Fri, Dec 13, 4:32 PM
Joe claimed this task.

Mentioned in SAL (#wikimedia-operations) [2024-12-16T17:15:43Z] <swfrench@deploy2002> Started scap sync-world: Deployment to pick up debug image changes - T381473

Mentioned in SAL (#wikimedia-operations) [2024-12-16T17:22:32Z] <swfrench@deploy2002> Finished scap sync-world: Deployment to pick up debug image changes - T381473 (duration: 06m 49s)

@Joe's change is now live. I've confirmed that the debug images now contain the dumps codebase at /srv/deployment/dumps as expected.

The first available image with this is docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-debug:2024-12-16-171556-publish-81.