Page MenuHomePhabricator

cloud-services-team (FY2023/2024-Q3-Q4)Milestone
ArchivedPublic

Members (14)

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

(FY2023/2024-Q3-Q4)

Recent Activity

Nov 13 2024

fnegri added a comment to T359412: [trove] wrong quota_usages values in project tf-infra-test.

This is still happening: T373348: [trove] Database quota values are not updating correctly

Nov 13 2024, 4:00 PM · Cloud-VPS, cloud-services-team (FY2023/2024-Q3-Q4)

Sep 13 2024

fnegri closed T347977: cloudcumin: allow wmcs-admin to run wikireplicas cookbooks and scripts, a subtask of T343330: WMCS cookbooks: provide shared hosts for people without global root privileges, as Declined.
Sep 13 2024, 2:24 PM · cloud-services-team (FY2023/2024-Q3-Q4), Cloud-VPS

Aug 19 2024

fnegri archived cloud-services-team (FY2023/2024-Q3-Q4).
Aug 19 2024, 9:26 AM
fnegri removed a hashtag from cloud-services-team (FY2023/2024-Q3-Q4): #wmcs-current.
Aug 19 2024, 9:26 AM

Aug 16 2024

fnegri closed T367778: [wikireplicas] frequent replag spikes in clouddb hosts as Resolved.

Replication lag is now back to zero on all clouddb* hosts. Upgrading all of them to bookworm and to the latest minor version is tracked in T365424: Upgrade clouddb* hosts to Bookworm so I'm gonna close this task as Resolved, as I'm not seeing "frequent replag spikes" any more.

Aug 16 2024, 2:02 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services

Aug 14 2024

Andrew triaged T364457: Migrate eqiad1 hypervisors to Neutron OVS agent as High priority.
Aug 14 2024, 1:34 PM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS

Aug 13 2024

fnegri added a subtask for T367393: Allow Superset to query ToolsDB public databases: T372395: Improve idempotency detection with helm diff.
Aug 13 2024, 1:33 PM · cloud-services-team (FY2024/2025-Q1-Q2), superset.wmcloud.org

Aug 12 2024

fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

I repooled clouddb1019, and reverted my change in the --busy-time parameter of clouddb1015. Let's see what happens to replag in the next few days.

Aug 12 2024, 5:17 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
gerritbot added a comment to T364492: Ownership confusion on cloud-local puppet servers.

Change #1060919 merged by Andrew Bogott:

[operations/puppet@production] puppetserver-deploy-code: don't use sudo when checking current branch

https://gerrit.wikimedia.org/r/1060919

Aug 12 2024, 1:57 PM · cloud-services-team (FY2024/2025-Q1-Q2), Patch-For-Review, Puppet-Infrastructure

Aug 10 2024

taavi removed a parent task for T352764: [builds-api] Add dashboards with the new statistics: T360190: [infra,builds-api,jobs-api,webservice] Provide metrics about build service and non-NFS adoption.
Aug 10 2024, 11:20 AM · Toolforge (Toolforge iteration 09), cloud-services-team (FY2023/2024-Q3-Q4)

Aug 8 2024

gerritbot added a comment to T364492: Ownership confusion on cloud-local puppet servers.

Change #1060919 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppetserver-deploy-code: don't use sudo when checking current branch

https://gerrit.wikimedia.org/r/1060919

Aug 8 2024, 8:03 PM · cloud-services-team (FY2024/2025-Q1-Q2), Patch-For-Review, Puppet-Infrastructure
Maintenance_bot removed a project from T356904: [cinder] [toolsdb] Deleting snapshot does not work: Patch-For-Review.
Aug 8 2024, 3:30 PM · Cloud-VPS, cloud-services-team (FY2023/2024-Q3-Q4), Data-Services
gerritbot added a comment to T356904: [cinder] [toolsdb] Deleting snapshot does not work.

Change #999218 merged by Andrew Bogott:

[operations/puppet@production] Add wmcs-empty-rbd-trash script

https://gerrit.wikimedia.org/r/999218

Aug 8 2024, 2:42 PM · Cloud-VPS, cloud-services-team (FY2023/2024-Q3-Q4), Data-Services
Maintenance_bot removed a project from T358774: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up: Patch-For-Review.
Aug 8 2024, 1:31 PM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS
gerritbot added a comment to T358774: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up.

Change #1060784 merged by David Caro:

[operations/puppet@production] wmcs-backups: add empty_trash command

https://gerrit.wikimedia.org/r/1060784

Aug 8 2024, 1:11 PM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS
gerritbot added a project to T358774: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up: Patch-For-Review.
Aug 8 2024, 10:12 AM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS
gerritbot added a comment to T358774: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up.

Change #1060784 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs-backups: add empty_trash command

https://gerrit.wikimedia.org/r/1060784

Aug 8 2024, 10:12 AM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS

Aug 7 2024

BTullis added a comment to T368066: Prepare and check storage layer for btmwiki.

We are still experiencing a failure relating to btmwiki at the beginning of each month.
It is something to do with the grants on an-redacteddb1001, I believe.
I have created T371991: Investigate MariaDB grant issues with the btmwiki database on an-redacteddb1001 to track the work to fix it.

Aug 7 2024, 4:53 PM · cloud-services-team (FY2023/2024-Q3-Q4), Data-Services, DBA
Raymond_Ndibe added a project to T350687: [harbor] Move harbor data to object storage service: User-Raymond_Ndibe.
Aug 7 2024, 5:18 AM · cloud-services-team (FY2024/2025-Q1-Q2), User-Raymond_Ndibe, Goal, Toolforge

Aug 6 2024

fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

So far the upgrade to Bookworm and MariaDB 10.6.18 seems to have helped. There is no replication lag and traffic shapes in clouddb1019 are again looking "healthy", i.e. similar to the shapes they had before 2024-06-12 with big spikes but also clear breaks between the spikes:

Aug 6 2024, 1:30 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services

Aug 5 2024

gerritbot added a comment to T364492: Ownership confusion on cloud-local puppet servers.

Change #1055502 merged by Andrew Bogott:

[operations/puppet@production] git-sync-upstream: remove gitpuppet user from networktests

https://gerrit.wikimedia.org/r/1055502

Aug 5 2024, 6:37 PM · cloud-services-team (FY2024/2025-Q1-Q2), Patch-For-Review, Puppet-Infrastructure
gerritbot added a comment to T364492: Ownership confusion on cloud-local puppet servers.

Change #1058675 merged by Andrew Bogott:

[operations/puppet@production] git-sync-upstream: execute the entire script as gitpuppet

https://gerrit.wikimedia.org/r/1058675

Aug 5 2024, 6:07 PM · cloud-services-team (FY2024/2025-Q1-Q2), Patch-For-Review, Puppet-Infrastructure
bd808 added a comment to T367415: Allow Quarry to query its own database.

@bd808 I'm interested in your opinion on this one. I created a pull request, but I'm also wondering if anybody still wants it and maybe we should just ignore it until someone asks for it. The original request dates back to 2016 (see the parent task).

Aug 5 2024, 4:58 PM · cloud-services-team (FY2024/2025-Q1-Q2), Quarry
bd808 updated the task description for T367415: Allow Quarry to query its own database.
Aug 5 2024, 4:43 PM · cloud-services-team (FY2024/2025-Q1-Q2), Quarry
fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

I skipped the reimage of clouddb1021 (see comments in T365424) and proceeded with the reimage of clouddb1019, which is now running Bookworm and MariaDB 10.6.18.

Aug 5 2024, 3:18 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
dcaro edited projects for T318479: Intermittent redis connection timeouts in Toolforge, added: Toolforge (Toolforge iteration 14); removed Toolforge (Toolforge iteration 13).
Aug 5 2024, 7:49 AM · Toolforge (Toolforge iteration 16), cloud-services-team (FY2024/2025-Q1-Q2)
dcaro edited projects for T314664: [infra] Decommission the Grid Engine infrastructure, added: Toolforge (Toolforge iteration 14); removed Toolforge (Toolforge iteration 13).
Aug 5 2024, 7:41 AM · Toolforge (Toolforge iteration 16), cloud-services-team (FY2024/2025-Q1-Q2), Goal

Aug 2 2024

fnegri updated subscribers of T367415: Allow Quarry to query its own database.

@bd808 I'm interested in your opinion on this one. I created a pull request, but I'm also wondering if anybody still wants it and maybe we should just ignore it until someone asks for it. The original request dates back to 2016 (see the parent task).

Aug 2 2024, 12:47 PM · cloud-services-team (FY2024/2025-Q1-Q2), Quarry

Aug 1 2024

fnegri added a comment to T367393: Allow Superset to query ToolsDB public databases.

@Zache hmm this looks like a Superset bug to me, I found https://github.com/apache/superset/issues/15876 but they say it's resolved in Superset version 3, and we are running 3.11.

Aug 1 2024, 3:32 PM · cloud-services-team (FY2024/2025-Q1-Q2), superset.wmcloud.org
Zache added a comment to T367393: Allow Superset to query ToolsDB public databases.

I updated the SQL query so that it will fetch large numbers instead of zeroes. (ie. SELECT max(phash) FROM imagehash) Also screenshot for consistency.

Aug 1 2024, 3:08 PM · cloud-services-team (FY2024/2025-Q1-Q2), superset.wmcloud.org
fnegri added a comment to T367393: Allow Superset to query ToolsDB public databases.

@Zache the query works for me:

Aug 1 2024, 2:55 PM · cloud-services-team (FY2024/2025-Q1-Q2), superset.wmcloud.org
Zache added a comment to T367393: Allow Superset to query ToolsDB public databases.

I got an error when i tried to query bigint columns

Aug 1 2024, 2:52 PM · cloud-services-team (FY2024/2025-Q1-Q2), superset.wmcloud.org
github-toolforge-bot added a comment to T367415: Allow Quarry to query its own database.

dhinus opened https://github.com/toolforge/quarry/pull/61

Aug 1 2024, 2:47 PM · cloud-services-team (FY2024/2025-Q1-Q2), Quarry
github-toolforge-bot added a comment to T367393: Allow Superset to query ToolsDB public databases.

dhinus opened https://github.com/toolforge/superset-deploy/pull/27

Aug 1 2024, 2:23 PM · cloud-services-team (FY2024/2025-Q1-Q2), superset.wmcloud.org

Jul 31 2024

Andrew added a comment to T341338: eqiad1: fix PTR delegations for 185.15.56.0/24.

@cmooney can you advise what (if anything) needs doing here?

Jul 31 2024, 9:54 PM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS, User-aborrero
gerritbot added a comment to T364492: Ownership confusion on cloud-local puppet servers.

Change #1058675 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] git-sync-upstream: execute the entire script as gitpuppet

https://gerrit.wikimedia.org/r/1058675

Jul 31 2024, 7:34 PM · cloud-services-team (FY2024/2025-Q1-Q2), Patch-For-Review, Puppet-Infrastructure
fnegri renamed T341769: [cloudvps] use a systemd timer for the OpenTofu tests to get logs from [cloudvps] use a systemd timer for the terraform tests to get logs to [cloudvps] use a systemd timer for the OpenTofu tests to get logs.
Jul 31 2024, 3:39 PM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS, Cloud-Services-Worktype-Maintenance, Cloud-Services-Origin-Alert, User-dcaro
fnegri renamed T341814: [cloudvps] puppetize the OpenTofu tests VM (tf-infra-test) from [cloudvps] puppetize the terraform tests VM (tf-infra-test) to [cloudvps] puppetize the OpenTofu tests VM (tf-infra-test).
Jul 31 2024, 3:39 PM · cloud-services-team (FY2024/2025-Q1-Q2), Cloud-VPS, Cloud-Services-Worktype-Maintenance, Cloud-Services-Origin-Alert, User-dcaro
Marostegui added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

Great - thank you!

Jul 31 2024, 2:11 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

Will you try first with clouddb1021 as we discussed a few days ago?

Jul 31 2024, 2:09 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
Marostegui added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

Will you try first with clouddb1021 as we discussed a few days ago?

Jul 31 2024, 2:05 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

I filed T371485: Grafana MySQL charts can be inconsistent when zooming out to improve the Grafana charts.

Jul 31 2024, 1:30 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services

Jul 30 2024

fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

there was a big increase in traffic to clouddb1019 starting from 2024-06-12 when I rebooted the host, but the "Network activity" graph for cloudlb1002 is showing that the increase actually started a couple days before on 2024-06-10:

Jul 30 2024, 5:09 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
fnegri attached a referenced file: F56788359: Screenshot 2024-07-30 at 16.21.39.png.
Jul 30 2024, 5:09 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
fnegri attached a referenced file: F56790728: Screenshot 2024-07-30 at 18.44.01.png.
Jul 30 2024, 5:08 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

An even better comparison is between clouddb1019 (the host struggling with replication lag) and clouddb1015 (the "web" s4 wikireplica).

Jul 30 2024, 2:31 PM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services
fnegri added a comment to T367778: [wikireplicas] frequent replag spikes in clouddb hosts.

The schema change took about 27 hours to complete in db1155 (Sanitarium host):

Jul 30 2024, 10:55 AM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services

Jul 29 2024

joanna_borun lowered the priority of T346453: [cumin] [openstack] Openstack backend fails when project is not set from High to Medium.
Jul 29 2024, 2:54 PM · cloud-services-team (FY2024/2025-Q1-Q2), Patch-For-Review, Infrastructure-Foundations, Cloud-VPS, Cumin
fnegri closed T370760: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-07-23 as Resolved.

Replication resumed at around 2024-07-27 23:30 UTC and caught up with the primary around 16 hours later.

Jul 29 2024, 2:23 PM · cloud-services-team (FY2023/2024-Q3-Q4), Data-Services
fnegri moved T370760: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-07-23 from In progress to Done on the cloud-services-team (FY2023/2024-Q3-Q4) board.
Jul 29 2024, 2:22 PM · cloud-services-team (FY2023/2024-Q3-Q4), Data-Services