Page MenuHomePhabricator

vote.wikimedia.org's Special:Securepoll/list/1402 takes considerably longer in codfw than in eqiad, leading to timeouts
Closed, ResolvedPublic

Description

@Sotiale (one of the ACE2022 election scrutineers) reported that accessing Special:Securepoll/list/1402 at vote.wikimedia.org takes a very long time (dozens of seconds). They also mentioned that the special page timeouts with limit 250 votes and higher. According to operations/dns:geo-maps, their requests are routed to codfw.

For me (@Urbanecm), accessing vote.wikimedia.org works perfectly fine, and it is fairly quick, even when displaying 500 votes. However, my requests are routed to codfw, not eqiad. When I VPN'ed to USA, vote.wikimedia.org slowed down significantly, and I was not able to load https://vote.wikimedia.org/w/index.php?limit=250&title=Special%3ASecurePoll%2Flist%2F1402 at all (timeouted):

image.png (473×1 px, 68 KB)

This issue impacts scrutineering of ACE2022 (in-progress), as it means @Sotiale is not able to see many of the votes.

Event Timeline

I'll do scrutineering with VPN if needed (I've been advised to detour to Europe), but hopefully this issue gets resolved before then ;)

It can be either of two things:

  • schema drift in codfw like missing indexes (more likely)
  • GET request needing to read from DB_PRIMARY making cross-dc connections all the time, it could make it slow but not this slow.

Let me check and I'll get back to you.

Slightly unrelated but still. The special page makes 800 queries while it could make just forty by batching and not making three duplicate queries back to back for each user: https://logstash.wikimedia.org/goto/14f431d3527997ed9ac9b3972d393a03

The second one is likely the cause with combination of the comment above: https://logstash.wikimedia.org/app/dashboards#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2022.12.06?id=c7LN54QBIFwrQt9tdgKb

Point of order:

  • Make it read replica so it would read from the local replica, there is no need to be exact per millisecond here
  • Make the script batch and at least not make three duplicate queries per user.

Change 865089 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/SecurePoll@master] ListPager: Only call Voter::newFromId() if return value is needed

https://gerrit.wikimedia.org/r/865089

Change 865090 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/SecurePoll@master] Voter::newFromID() query from replica, not primary

https://gerrit.wikimedia.org/r/865090

Change 865089 merged by jenkins-bot:

[mediawiki/extensions/SecurePoll@master] ListPager: Only call Voter::newFromId() if return value is needed

https://gerrit.wikimedia.org/r/865089

Change 864921 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/SecurePoll@wmf/1.40.0-wmf.12] ListPager: Only call Voter::newFromId() if return value is needed

https://gerrit.wikimedia.org/r/864921

Change 864922 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/SecurePoll@wmf/1.40.0-wmf.13] ListPager: Only call Voter::newFromId() if return value is needed

https://gerrit.wikimedia.org/r/864922

Change 864921 merged by jenkins-bot:

[mediawiki/extensions/SecurePoll@wmf/1.40.0-wmf.12] ListPager: Only call Voter::newFromId() if return value is needed

https://gerrit.wikimedia.org/r/864921

Change 864922 merged by jenkins-bot:

[mediawiki/extensions/SecurePoll@wmf/1.40.0-wmf.13] ListPager: Only call Voter::newFromId() if return value is needed

https://gerrit.wikimedia.org/r/864922

Mentioned in SAL (#wikimedia-operations) [2022-12-06T15:33:57Z] <reedy@deploy1002> Synchronized php-1.40.0-wmf.12/extensions/SecurePoll/includes/Pages/ListPager.php: T324556 (duration: 07m 13s)

^ in theory, that should have a marked improvement... Stopping numerous queries

Mentioned in SAL (#wikimedia-operations) [2022-12-06T15:41:33Z] <reedy@deploy1002> Synchronized php-1.40.0-wmf.13/extensions/SecurePoll/includes/Pages/ListPager.php: T324556 (duration: 07m 01s)

Thanks for the quick patch & deployment! @Sotiale mentioned it works now.

It's probably worth exploring/implementing some of the other stuff Amir mentioned (especially reads going to DB_PRIMARY when I suspect for cases like the voter lists, they probably don't need to), but can be forked into other tasks

The batching *may* still be useful; but the triplicate queries, in theory have gone from 3 to zero, especially in this workload, as the infomration being requested isn't actually needed, so doesn't need to be queried.

yeah, it works now but I suggest merging the other patch first before closing this ticket. Krinkle is already reviewing it.

Change 865090 merged by jenkins-bot:

[mediawiki/extensions/SecurePoll@master] Make Voter::newFromID() and Context::getVoter() expose DB index

https://gerrit.wikimedia.org/r/865090

Ladsgroup assigned this task to Reedy.

The page is loading in codfw, the improvements have been merged. Calling it done.