Page MenuHomePhabricator

Issues with translatable pages on Wikidata due to revision id overflow
Closed, ResolvedPublic8 Estimated Story PointsBUG REPORT

Description

Anything that uses the revtag table (transver, fuzzy tag, translatable pages with tags/marks) is getting broken in various way due to revision id getting overflow.

See also https://www.wikidata.org/w/index.php?title=Wikidata:Translators%27_noticeboard&diff=0&oldid=2108897068

Original report

Steps to reproduce

  1. Edit a translation unit manually to prefix it with !!FUZZY!!. Example.

Actual result

  1. !!FUZZY!! appears literally in the translation page source. Example.

Expected result

  1. The translation unit is wrapped in <div class="mw-translate-fuzzy"> (assuming it’s a block translation unit and doesn’t have nowrap attribute). Example from a week ago, when it still worked.

Software version

Regression from 1.43-wmf.4 (T361398).

Impact

It’s unfortunately fairly common that newbies/anons press Confirm translation in Tux even though the translation clearly needs to be updated (in this specific case, a complete sentence is simply missing in the translation), probably testing the interface. Before this regression, manually prepending !!FUZZY!! could partly revert these otherwise unrevertable edits (only partly because the diff no longer appears in Tux, since Translate no longer knows when the translation was really last updated). Now I don’t think it’s possible to revert these test edits even partly.

Event Timeline

Nikerabbit renamed this task from Manual !!FUZZY!! no longer working in page translation to Manual !!FUZZY!! no longer working in page translation in Wikidata.May 16 2024, 8:52 AM
Nikerabbit triaged this task as High priority.EditedMay 16 2024, 9:11 AM
Nikerabbit subscribed.
[email protected](wikidatawiki)> select * from revtag order by rt_revision desc limit 20;
+-------------+-----------+-------------+------------+
| rt_type     | rt_page   | rt_revision | rt_value   |
+-------------+-----------+-------------+------------+
| tp:transver | 120029430 |  2147483647 | 1098799050 |
| tp:transver | 120029424 |  2147483647 | 1132541301 |
| tp:transver | 120027693 |  2147483647 | 1976226182 |
| tp:transver | 120027628 |  2147483647 | 1976226179 |
| tp:transver | 120027163 |  2147483647 | 144873300  |
| tp:transver | 120027157 |  2147483647 | 667410785  |
| tp:transver | 120027155 |  2147483647 | 493090379  |
| tp:transver | 120027139 |  2147483647 | 162343439  |
| tp:transver | 120027127 |  2147483647 | 144872871  |
| tp:transver | 120027124 |  2147483647 | 144872806  |
| tp:transver | 120027119 |  2147483647 | 1239191230 |
| tp:transver | 120027115 |  2147483647 | 1239191227 |
| tp:transver | 120027114 |  2147483647 | 145088425  |
| tp:transver | 120027111 |  2147483647 | 144873158  |
| tp:transver | 120027106 |  2147483647 | 145088417  |
| tp:transver | 120023115 |  2147483647 | 146309988  |
| tp:transver | 120023113 |  2147483647 | 147402787  |
| tp:transver | 120023111 |  2147483647 | 147402785  |
| tp:transver | 120023109 |  2147483647 | 147402783  |
| tp:transver | 120023106 |  2147483647 | 147402781  |
+-------------+-----------+-------------+------------+
20 rows in set (0.001 sec)

rt_revision is getting "truncated" to highest possible value supported by the field.

[email protected](wikidatawiki)> show create table revtag\G
*************************** 1. row ***************************
       Table: revtag
Create Table: CREATE TABLE `revtag` (
  `rt_type` varbinary(60) NOT NULL,
  `rt_page` int(11) NOT NULL,
  `rt_revision` int(11) NOT NULL,
  `rt_value` blob DEFAULT NULL,
  PRIMARY KEY (`rt_type`,`rt_page`,`rt_revision`),
  KEY `rt_revision_type` (`rt_revision`,`rt_type`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
1 row in set (0.001 sec)

We should change the type to BIGINT UNSIGNED to match core.

Nikerabbit renamed this task from Manual !!FUZZY!! no longer working in page translation in Wikidata to Issues with translatable pages due to revision id overflow.May 16 2024, 5:29 PM
Nikerabbit raised the priority of this task from High to Unbreak Now!.
Nikerabbit updated the task description. (Show Details)
Nikerabbit set the point value for this task to 8.

I've marked this UBN! because this is not limited to fuzzy handling.

Nikerabbit renamed this task from Issues with translatable pages due to revision id overflow to Issues with translatable pages on Wikidata due to revision id overflow.May 16 2024, 5:32 PM

Change #1032721 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] schema: Convert revtag integer columns to unsigned bigint columns

https://gerrit.wikimedia.org/r/1032721

Change #1032721 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] schema: Convert revtag integer columns to unsigned bigint columns

https://gerrit.wikimedia.org/r/1032721

Change #1034055 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] SchemaHookHandler: Add unsinged bigint schema change for revtag table

https://gerrit.wikimedia.org/r/1034055

Reedy subscribed.

Needs a task creating for WMF prod for the schema change to be applied as per https://wikitech.wikimedia.org/wiki/Schema_changes

Change #1034055 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] SchemaHookHandler: Add unsinged bigint schema change for revtag table

https://gerrit.wikimedia.org/r/1034055

Schema change got deployed to wikidatawiki.

MariaDB [wikidatawiki_p]> desc revtag;
+-------------+---------------------+------+-----+---------+-------+
| Field       | Type                | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| rt_type     | varbinary(60)       | NO   |     | NULL    |       |
| rt_page     | bigint(20) unsigned | NO   |     | NULL    |       |
| rt_revision | bigint(20) unsigned | NO   |     | NULL    |       |
| rt_value    | blob                | YES  |     | NULL    |       |
+-------------+---------------------+------+-----+---------+-------+
4 rows in set (0,002 sec)

MariaDB [wikidatawiki_p]>
abi_ updated Other Assignee, added: Zabe.

Yeah, with one of the revisions being 2147483647, and one of the pages being translateable... Seems almost certainly!

MariaDB [wikidatawiki]> select page_namespace, count(page_id) from page INNER JOIN revtag ON (page_id = rt_page) where rt_revision = 2147483647 group by page_namespace;
+----------------+----------------+
| page_namespace | count(page_id) |
+----------------+----------------+
|              4 |              2 |
|             12 |              1 |
|           1198 |           1010 |
+----------------+----------------+
3 rows in set (0.029 sec)
MariaDB [wikidatawiki]> select distinct rt_type from revtag where rt_revision = 2147483647;
+-------------+
| rt_type     |
+-------------+
| fuzzy       |
| tp:mark     |
| tp:tag      |
| tp:transver |
+-------------+
4 rows in set (0.002 sec)
MariaDB [wikidatawiki]> select rt_type, count(rt_type) from revtag where rt_revision = 2147483647 group by rt_type;
+-------------+----------------+
| rt_type     | count(rt_type) |
+-------------+----------------+
| fuzzy       |             15 |
| tp:mark     |              1 |
| tp:tag      |              2 |
| tp:transver |           1042 |
+-------------+----------------+
4 rows in set (0.002 sec)

I think translate_reviews.trr_revision might also be affected (found via T365445):

mysql:research@dbstore1009.eqiad.wmnet [wikidatawiki]> SELECT trr_revision, COUNT(*) FROM translate_reviews WHERE trr_revision >= 2130000000 GROUP BY trr_revision ORDER BY trr_revision ASC;
+--------------+----------+
| trr_revision | COUNT(*) |
+--------------+----------+
|   2131243224 |        1 |
|   2131243602 |        1 |
|   2131243677 |        1 |
|   2131243747 |        1 |
|   2134240427 |        1 |
|   2134249853 |        1 |
|   2136744814 |        1 |
|   2136746153 |        1 |
|   2138381191 |        1 |
|   2138382550 |        1 |
|   2140210428 |        2 |
|   2142160688 |        2 |
|   2147483647 |        4 |
+--------------+----------+
13 rows in set (0.032 sec)

I'm not sure what help having numbers under the headings are, especially when two of them are the number 2...

And all three only contain one line each

This one seems done. Cleanup of wrong data is being discussed in T365355: Maintenance script to regenerate revtag for list of pages.

Thanks all for fixing this!

I'm not sure what help having numbers under the headings are, especially when two of them are the number 2...

And all three only contain one line each

It’s a process: it starts with the first step, and continues with the second step – the latter is branched into actual and expected.