Page MenuHomePhabricator

REST page/html endpoints should support language variants
Closed, ResolvedPublic

Description

When a client requests rendered page content from a page/{title}/html and revision/{title}/html endpoint, it should be able to specify the desired language variant in the Accept-Language header. If necessary and possible, the endpoint should apply language variant conversion (e.g. cyrillic serbian to roman serbian, traditional chinese to simplified chinese, tec).

Currently, language variant conversion is supported by the /transform/from/html/to/html endpoint, but the client would have to supply the HTML of the content to be converted. So clients would have to first load the content HTML, then send it back for conversion. The goal of this ticket is to avoid the additional round trip.

Implementation Notes

The conversion is currently implemented in ParsoidHandler::languageConversion. However, we intent to remove ParsoidHandler in the near future. All behavior should be factored out of that class, into either service objects, or handler "helper" objects.

The "helper" used for getting page HTML is HtmlOutputRendererHelper. This is where language conversion needs to happen in order to fulfill this ticket. However, we need to retain the old behavior as well, offering language conversion as a separate transformation. It would probably be best to extract ParsoidHandler::languageConversion into a service object, which can then be called from HtmlOutputRendererHelper as well as from ParsoidHandler.

Considerations

We will likely want to be able to mix and match different transformations that can be applied to page content HTML. ParoisHandler::pb2pb currently supports two: language variant conversion, and redlinks. Since we are likely to add more such transformations (e.g. TOC generation), perhaps a factory approach would be appropriate. We also already have an HTMLTransformFactory and HTMLTransform (for transforming HTML to wikitext). We could introduce a LanguageConversion that performs the conversion, and give HTMLTransformFactory a getLanguageConversion() method.

Event Timeline

Change 833627 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/core@master] Add ParserOutputFromPageBundleTrait

https://gerrit.wikimedia.org/r/833627

Change 833627 merged by jenkins-bot:

[mediawiki/core@master] Add PageBundleParserOutputConverter

https://gerrit.wikimedia.org/r/833627

Change 835611 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/core@master] Introduce LanguageVariantConverter

https://gerrit.wikimedia.org/r/835611

abi_ changed the task status from Open to In Progress.Sep 28 2022, 8:08 AM

Change 835611 merged by jenkins-bot:

[mediawiki/core@master] Introduce LanguageVariantConverter

https://gerrit.wikimedia.org/r/835611

Change 838146 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Re-apply: Introduce LanguageVariantConverter

https://gerrit.wikimedia.org/r/838146

Change 838146 merged by jenkins-bot:

[mediawiki/core@master] Re-apply: Introduce LanguageVariantConverter

https://gerrit.wikimedia.org/r/838146

Change 840098 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/core@master] LanguageVariantConverter: Use language code from PageConfig if set

https://gerrit.wikimedia.org/r/840098

Change 840098 merged by jenkins-bot:

[mediawiki/core@master] LanguageVariantConverter: Use content language code from HTTP header

https://gerrit.wikimedia.org/r/840098

Change 841144 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/core@master] page/html endpoint: Support variant conversion

https://gerrit.wikimedia.org/r/841144

Change 842702 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/VisualEditor@master] Add HtmlTransformFactory as depdendency for HtmlOutputRendererHelper

https://gerrit.wikimedia.org/r/842702

Change 842705 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/core@master] HtmlOutputRendererHelper: Make HtmlTransformFactory mandatory

https://gerrit.wikimedia.org/r/842705

Change 850670 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/core@master] revision/html endpoint: Support variant conversion

https://gerrit.wikimedia.org/r/850670

Change 841144 merged by jenkins-bot:

[mediawiki/core@master] page/html endpoint: Support variant conversion

https://gerrit.wikimedia.org/r/841144

Change 842702 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Add HtmlTransformFactory as depdendency for HtmlOutputRendererHelper

https://gerrit.wikimedia.org/r/842702

Change 842705 merged by jenkins-bot:

[mediawiki/core@master] HtmlOutputRendererHelper: Make HtmlTransformFactory mandatory

https://gerrit.wikimedia.org/r/842705

Change 850670 merged by jenkins-bot:

[mediawiki/core@master] revision/html endpoint: Support variant conversion

https://gerrit.wikimedia.org/r/850670

abi_ closed this task as Resolved.EditedDec 6 2022, 6:33 PM
abi_ moved this task from Backlog to Done on the User-abi_ board.

We've added support for variant conversion to both the APIs:

  • page/{title}/html
  • revision/{title}/html

The APIs use the Accept-Language header to determine the target language to be used for conversion.