Page MenuHomePhabricator

Create anaconda .deb package with stacked conda user envs
Closed, ResolvedPublic

Description

The first step to implement Newpyter is to make it possible and easy to use conda with read-only base anaconda distribution (plus some extra packages we want) with a writeable 'stacked' conda user environment.

We will create a .deb package of Anaconda and include some extra tooling to create and activate stacked user environments.

Event Timeline

@elukey a tricky bit here is how HUGE anaconda is. Uncompressed it is around 6G. If we were to do like we do with e.g. Druid or SWAP where we commit the artifacts to git in order to use git-buildpackage, we'd be creating HUGE HUGE commits in gerrit every time we upgrade, which I don't think releng would enjoy.

I haven't committed any work yet because I'm not sure what to do about this. Assuming we were to commit the entire anaconda distribution to git, the process for upgrading the release would be something like this:

# cd to operations/debs/anaconda checkout.
cd debs/anaconda
git checkout master
# Remove all existent anaconda files.
git rm -r anaconda

# Download and install Anaconda distribution into ./anaconda
ANACONDA_VERSION="2020.02"
# (Yes, this is a giant bash script with the compressed distribution inside of it!)
wget https://repo.anaconda.com/archive/Anaconda3-${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/Anaconda3-${ANACONDA_VERSION}-Linux-x86_64.sh
bash ~/Anaconda3-${ANACONDA_VERSION}-Linux-x86_64.sh -b -p ./anaconda

# Add extra packages we want and need in the WMF anaconda distribution
export http_proxy=http://webproxy.eqiad.wmnet:8080
export https_proxy=http://webproxy.eqiad.wmnet:8080
./anaconda/bin/conda install --yes -c conda-forge --file ./debian/extra/conda-requirements.txt
./anaconda/bin/pip install -r ./debian/extra/pip-requirements.txt

git add anaconda
git commit -m "Update anaconda distribution to version ${ANACONDA_VERSION}"

git checkout debian
# Edit debian/changelog with new version info:
dch -i

# Make sure all source package binaries are listed in debian/source/include-binaries
find anaconda/ -type f -exec file -F ' ' {} \;  | grep -viE 'text|empty' | awk '{print $1}' | sort > debian/source/include-binaries

git add debian && git commit -m "Release version ${ANACONDA_VERSION}"
# Build the new package
GIT_PBUILDER_AUTOCONF=no DIST=buster gbp buildpackage -sa -us -uc --git-builder=git-pbuilder

The worst part about this is the giant git rm -r anaconda and then later git add anaconda. That's crazy!
I think we should try to make the debian package repo without committing the any anaconda files to it...but how to do this with git-buildpackage?

I think we could do this by only pushing the debian/ dir to gerrit, and including in instructions how to set up your local working copy of the git repo so that the master branch has the upstream release committed to it. (I don't mind committing to git locally, what I mind is uploading that stuff to gerrit.)

Or...we could avoid using git-buildpackage and somehow use pbuilder to dpkg-buildpackage directly? I've never done this, so we might want to ask Alex how to? Doing this would essentially be the same as using git-buildpackage with some special local git repo setup instructions idea, except there'd be no need to commit to a master/upstream branch; you'd just download and install Anaconda locally and build.

I'm inclined to proceed with the local git-buildpackage idea just because I think know how to make it work. Let's discuss on Monday :)

I think we could do this by only pushing the debian/ dir to gerrit, and including in instructions how to set up your local working copy of the git repo so that the master branch has the upstream release committed to it.

In that case you don't even need to commit locally. Simply check out the git repo with debian/, then fetch the source, check that it's untampered and to kick off the build you can simply run "DIST=buster pdebuild" on deneb.

I think we could do this by only pushing the debian/ dir to gerrit, and including in instructions how to set up your local working copy of the git repo so that the master branch has the upstream release committed to it.

In that case you don't even need to commit locally. Simply check out the git repo with debian/, then fetch the source, check that it's untampered and to kick off the build you can simply run "DIST=buster pdebuild" on deneb.

Just to clarify, if we are able to skip the gerrit patch (that's great), will it be ok to create a deb that weights GBs?

The size mentioned by Otto (6G) should be fine, IIRC there's some limitation within ar which imposes a maximum size of 10 digit bytes (so ~ 9.5 GiB), but there were also some ways to workaround that (would need to dig into it, but doesn't seem needed anyway).

We also have sufficient space on apt1001.

The size mentioned by Otto (6G)

That's the uncompressed size, the actually .deb size is something more like 2ish GB (IIRC).

you can simply run "DIST=buster pdebuild" on deneb.

GREAT perfect. Will go that route then.

Change 594204 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/anaconda@debian] [WIP] Initial debian commit

https://gerrit.wikimedia.org/r/594204

@MoritzMuehlenhoff I wonder if you have some tips for building a correct .orig.tar.gz file. Upstream does not release this, it is provided embedded in a .sh script. In addition I prep the environment I want to create with some extra conda packages so there is no upstream release of the package I am trying to build. gbp has always taken care of this for me. Currently I am doing

tar -cvzf ../anaconda_2020.02.orig.tar.gz --exclude 'debian' --exclude '.git*'

Also, since this package contains a lot of pre built binaries, I need to get a list of them all to use with source/include-binaries. I'm currently doing this like:

find anaconda/ -type f ! -size 0 -exec grep -IL . "{}" \; | sort > debian/source/include-binaries

Some combination of what I'm doing is not right. Lots of errors about missing symlinks:

dpkg-source: error: cannot represent change to anaconda/bin/configurable-http-proxy:
dpkg-source: error:   new version is symlink to ../lib/node_modules/configurable-http-proxy/bin/configurable-http-proxy
dpkg-source: error:   old version is nonexistent

I've tried adding the symlinks in source/include-binaries but I seem to get the same error either way.

I'm clearly doing something wrong! But using just pdebuild seems to be lacking a lot of stuff gbp gets me on the build server. E.g. The package files are all in ../ rather than in /var/cache/pbuilder.

dpkg-source: error: cannot represent change to anaconda/bin/configurable-http-proxy:
dpkg-source: error:   new version is symlink to ../lib/node_modules/configurable-http-proxy/bin/configurable-http-proxy
dpkg-source: error:   old version is nonexistent

See my comment in Gerrit.

But using just pdebuild seems to be lacking a lot of stuff gbp gets me on the build server. E.g. The package files are all in ../ rather than in /var/cache/pbuilder.

Only the source package ends up in "../", the final build result will still end up in /var/cache/pbuilder/result

Milimetric moved this task from Incoming to Data Exploration Tools on the Analytics board.

@elukey we want to include some extra packages in our globally installed anaconda distribution that are not included in upstream's anaconda. This includes the packages listed in T249078, as well as things like Neil's wmfdata.

I had considered renaming the repository and package to e.g. 'anaconda-wmf', but perhaps varying the debian version with 'wmf' in it, e.g. 2020.02~wmf0' is enough? We were planning on adding the 'wmf' variant in the version name anyway.

@elukey we want to include some extra packages in our globally installed anaconda distribution that are not included in upstream's anaconda. This includes the packages listed in T249078, as well as things like Neil's wmfdata.

I had considered renaming the repository and package to e.g. 'anaconda-wmf', but perhaps varying the debian version with 'wmf' in it, e.g. 2020.02~wmf0' is enough? We were planning on adding the 'wmf' variant in the version name anyway.

I'd prefer to have anconda-wmf, the ~wmf suffix leads me to think that we are applying patches in the Debian package (on top of the official anaconda distro), meanwhile this seems to be more a fork. But no strong opinion, anything that you prefer is fine :)

Ok, I think you are right. Will do that.

Change 610880 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/anaconda-wmf@debian] Initial debian commit

https://gerrit.wikimedia.org/r/610880

Change 594204 abandoned by Ottomata:
[operations/debs/anaconda@debian] Initial debian commit

Reason:
Renamed repo, so abandoned in favor of https://gerrit.wikimedia.org/r/c/operations/debs/anaconda-wmf/ /610880

https://gerrit.wikimedia.org/r/594204

Oh noooo! I just tried installing anaconda from my .deb for the first time.

[@stat1008:/usr/lib/anaconda-wmf] $ grep --binary-files=without-match -R 'home/otto' /usr/lib/anaconda-wmf  | wc -l
1619

[@stat1008:/usr/lib/anaconda-wmf] $ head -n 1 /usr/lib/anaconda-wmf/bin/conda
#!/home/otto/anaconda-wmf/anaconda-wmf/bin/python

I didn't realize a but apparently the anaconda install process creates hardcoded shebang paths for the shells scripts! None of my previous testing included installing anaconda from a .deb, so the paths were always the same: in my home directory and I didn't notice this.

Yargh...now what!? I could script either a postinstall or something in debian/rules to fix the shebang paths , but this just feels so hacky!

Change 610880 merged by Ottomata:
[operations/debs/anaconda-wmf@debian] Initial debian commit

https://gerrit.wikimedia.org/r/610880

Change 618106 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install anaconda-wmf on stat nodes

https://gerrit.wikimedia.org/r/618106

Change 618106 merged by Ottomata:
[operations/puppet@production] Install anaconda-wmf on stat nodes

https://gerrit.wikimedia.org/r/618106

Change 626448 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install anaconda-wmf on hadoop workerrs and clients

https://gerrit.wikimedia.org/r/626448

Change 626448 merged by Ottomata:
[operations/puppet@production] Install anaconda-wmf on hadoop workerrs and clients

https://gerrit.wikimedia.org/r/626448