(FY2023/2024-Q3-Q4)
Details
Nov 13 2024
This is still happening: T373348: [trove] Database quota values are not updating correctly
Sep 13 2024
Aug 19 2024
Aug 16 2024
Replication lag is now back to zero on all clouddb* hosts. Upgrading all of them to bookworm and to the latest minor version is tracked in T365424: Upgrade clouddb* hosts to Bookworm so I'm gonna close this task as Resolved, as I'm not seeing "frequent replag spikes" any more.
Aug 14 2024
Aug 13 2024
Aug 12 2024
I repooled clouddb1019, and reverted my change in the --busy-time parameter of clouddb1015. Let's see what happens to replag in the next few days.
Change #1060919 merged by Andrew Bogott:
[operations/puppet@production] puppetserver-deploy-code: don't use sudo when checking current branch
Aug 10 2024
Aug 8 2024
Change #1060919 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] puppetserver-deploy-code: don't use sudo when checking current branch
Change #999218 merged by Andrew Bogott:
[operations/puppet@production] Add wmcs-empty-rbd-trash script
Change #1060784 merged by David Caro:
[operations/puppet@production] wmcs-backups: add empty_trash command
Change #1060784 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] wmcs-backups: add empty_trash command
Aug 7 2024
We are still experiencing a failure relating to btmwiki at the beginning of each month.
It is something to do with the grants on an-redacteddb1001, I believe.
I have created T371991: Investigate MariaDB grant issues with the btmwiki database on an-redacteddb1001 to track the work to fix it.
Aug 6 2024
So far the upgrade to Bookworm and MariaDB 10.6.18 seems to have helped. There is no replication lag and traffic shapes in clouddb1019 are again looking "healthy", i.e. similar to the shapes they had before 2024-06-12 with big spikes but also clear breaks between the spikes:
Aug 5 2024
Change #1055502 merged by Andrew Bogott:
[operations/puppet@production] git-sync-upstream: remove gitpuppet user from networktests
Change #1058675 merged by Andrew Bogott:
[operations/puppet@production] git-sync-upstream: execute the entire script as gitpuppet
I skipped the reimage of clouddb1021 (see comments in T365424) and proceeded with the reimage of clouddb1019, which is now running Bookworm and MariaDB 10.6.18.
Aug 2 2024
@bd808 I'm interested in your opinion on this one. I created a pull request, but I'm also wondering if anybody still wants it and maybe we should just ignore it until someone asks for it. The original request dates back to 2016 (see the parent task).
Aug 1 2024
@Zache hmm this looks like a Superset bug to me, I found https://github.com/apache/superset/issues/15876 but they say it's resolved in Superset version 3, and we are running 3.11.
I updated the SQL query so that it will fetch large numbers instead of zeroes. (ie. SELECT max(phash) FROM imagehash) Also screenshot for consistency.
@Zache the query works for me:
I got an error when i tried to query bigint columns
dhinus opened https://github.com/toolforge/quarry/pull/61
dhinus opened https://github.com/toolforge/superset-deploy/pull/27
Jul 31 2024
@cmooney can you advise what (if anything) needs doing here?
Change #1058675 had a related patch set uploaded (by JHathaway; author: JHathaway):
[operations/puppet@production] git-sync-upstream: execute the entire script as gitpuppet
Great - thank you!
Will you try first with clouddb1021 as we discussed a few days ago?
Will you try first with clouddb1021 as we discussed a few days ago?
I filed T371485: Grafana MySQL charts can be inconsistent when zooming out to improve the Grafana charts.
Jul 30 2024
there was a big increase in traffic to clouddb1019 starting from 2024-06-12 when I rebooted the host, but the "Network activity" graph for cloudlb1002 is showing that the increase actually started a couple days before on 2024-06-10:
An even better comparison is between clouddb1019 (the host struggling with replication lag) and clouddb1015 (the "web" s4 wikireplica).
The schema change took about 27 hours to complete in db1155 (Sanitarium host):
Jul 29 2024
Replication resumed at around 2024-07-27 23:30 UTC and caught up with the primary around 16 hours later.