Page MenuHomePhabricator

Move as much grammar transformation code as possible from PHP and JS to separate generic data files
Open, LowPublic

Description

Core MediaWiki has code for simple grammar transformations, used with {{GRAMMAR}} in messages. This code has to be written (and tested) separately in PHP and JS (and also in jquery.i18n).

The logic is supposed to be the same for the backend and the frontend, so it makes sense to make as much of the code and the data as possible shared between the (programming) languages. For most (human) languages the idea is the same: if a word matches a pattern, transform it according to a rule.

@Nikerabbit and I (@Amire80) went over most of the current code that does this, and as far as we can see, it can be replaced with pairs of regular expression patterns and replacements.

The plan is more or less this:

  • Make sure that all the relevant unit tests are written.
  • Make the tests common to PHP and JS (T115218).
  • Find the patterns for each language, convert them from PHP and JS to regular expressions in JSON files, and delete and PHP and JS code.
  • Optional 1: Move these JSON files, the tests and the PHP and JS logic that processes them) from the core to a separate library.
  • Optional 2: Allow sites to provide custom grammar rules (and possibly move custom $wgGrammarForms from PHP arrays to a more data-based format, but this requires some thought).

Event Timeline

Amire80 raised the priority of this task from to Low.
Amire80 updated the task description. (Show Details)
Amire80 added subscribers: Amire80, Nikerabbit.
Amire80 set Security to None.

Change 241645 had a related patch set uploaded (by Amire80):
Make the code for grammar data processing common

https://gerrit.wikimedia.org/r/241645

Change 241499 had a related patch set uploaded (by Amire80):
Make grammar data loadable as an RL module and usable in JS

https://gerrit.wikimedia.org/r/241499

Change 245184 had a related patch set uploaded (by Amire80):
Move the Ukrainian grammar rules from PHP and JS to JSON

https://gerrit.wikimedia.org/r/245184

Change 241499 merged by jenkins-bot:
Make grammar data loadable as an RL module and usable in JS

https://gerrit.wikimedia.org/r/241499

Change 241645 merged by jenkins-bot:
Make the code for grammar data processing common

https://gerrit.wikimedia.org/r/241645

Change 245184 merged by jenkins-bot:
Move the Ukrainian grammar rules from PHP and JS to JSON

https://gerrit.wikimedia.org/r/245184

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)