Page MenuHomePhabricator

Reduce or eliminate the need for the user to touch <translate> tags and unit markers
Open, HighPublic

Description

Lots of pages on mediawiki.org have "<translate>" and "<!--T:123-->" bits strewn about, making them hard to read and edit in source, and hard to edit in Visual Editor.

Translation should not rely on making large parts of pages uneditable!

Per discussion it looks like some small tweaks to VE's treatment of the extension tag may simplify things without introducing too much breakage, until a more VE-native solution is available.

To-do: add detail bugs and replace this one.


See also: use cases for which <translate> tags are considered better than structured translation: T116235: [Epic] CentralNotice translation should move closer to MediaWiki i18n standards and the code cleaned up

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
ResolvedArlolra
Declinedssastry
ResolvedArlolra
OpenNone
Resolvedihurbain
ResolvedNikerabbit
ResolvedNikerabbit
Resolvedihurbain
Resolvedihurbain
Resolvedihurbain
Resolvedihurbain
Resolvedihurbain
ResolvedBUG REPORTihurbain
Resolvedssastry
Resolvedihurbain
ResolvedBUG REPORTihurbain
DeclinedBUG REPORTihurbain
ResolvedBUG REPORTihurbain
OpenBUG REPORTNone
ResolvedBUG REPORTihurbain
ResolvedBUG REPORTArlolra
Resolvedihurbain
ResolvedBUG REPORTihurbain
ResolvedFeatureihurbain
ResolvedBUG REPORTmatmarex
OpenBUG REPORTNone
OpenNone
Resolvedihurbain
StalledNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

You can play with the tags, see existing documentation: https://www.mediawiki.org/wiki/Help:Extension:Translate

Thanks, will try to decipher the code...

I think that perhaps VE could just treat those tags as plaintext for now if that would cause less problems.

That would probably help a great deal.

I do not understand why tags spanning sections would be a problem though, especially when they never appear in rendered output.

It's confusing as heck when trying to edit, and can easily lead to mismatched tag pairs during page refactoring.

I do not understand why tags spanning sections would be a problem though, especially when they never appear in rendered output.

It's confusing as heck when trying to edit, and can easily lead to mismatched tag pairs during page refactoring.

Translate should complain and prevent saving if it detects mismatched translate tags.

@Nikerabbit I see no information on that page about the <!--T:\d+--> items. Is it safe to remove them?

Is it safe to refactor large multi-paragraph <translate>...</translate> chunks into multiple single-paragraph or sub-paragraph <translate>...</translate> chunks?

Ok... https://www.mediawiki.org/wiki/Help:Extension:Translate/Page_translation_administration#markup seems to indicate that any change whatsoever will break stuff. Which makes it very ....... very.... fragile.

In T131516#2169540, @brion wrote:

@Nikerabbit I see no information on that page about the <!--T:\d+--> items. Is it safe to remove them?

That would lose connection to all existing translations.

Is it safe to refactor large multi-paragraph <translate>...</translate> chunks into multiple single-paragraph or sub-paragraph <translate>...</translate> chunks?

Yes to multi to single, as long as the T-comments are preserved. For sub-paragraphs you would again disconnect all existing translations.

Ok... https://www.mediawiki.org/wiki/Help:Extension:Translate/Page_translation_administration#markup seems to indicate that any change whatsoever will break stuff. Which makes it very ....... very.... fragile.

Not any change, the page should explain how it works. The stuff is split into units delimited by translate tags or empty lines. The T-comments identify the units so that they can be changed or moved without losing connection to existing translations.

Hmm, there's a lot of weird recommendations in there too, such as recommending to put multiple levels of markup together. For instance this looks so obviously wrong:

Wrong:

== <translate>Culture</translate> ==

Wrong:

<translate>== Culture ==</translate>

Suggested segmentation:

<translate>
== Culture ==

Lorem ipsum dolor.
</translate>

The first "wrong" recommendation looks clearly right.

The second looks right according to the doc next to it ("Headers can in principle be tied to the following paragraph, but it is better to have them separated.") but is labeled as wrong.

The "right" one looks clearly wrong, leaving a stray "<translate>" at the top of the previous section that won't appear within the section during section editing in the source editor. (This might contribute to the weird spanning problem, in that it encourages people to add more stuff at the end of the previous section, after the <translate> opener but before the == line.)

The suggested one is based on the practical reason that it is the only one which does not break wikitext section editing completely.

The suggested one is based on the practical reason that it is the only one which does not break wikitext section editing completely.

https://www.mediawiki.org/w/index.php?title=Testpage12354&diff=2089226&oldid=2089225 seemed to work fine using the first "Wrong" option. Can you tell me what doesn't work with this?

I marked the page for translation, now you can see it doesn't work anymore.

I marked the page for translation, now you can see it doesn't work anymore.

You broke the markup:

== <translate><!--T:3-->
Test2</translate> ==

It does, indeed, now not work. Why did you add the newline?

Note I can't even remove the incorrect newlines manually because the extension claims "Translation unit markers in unexpected position." Seems like some pretty bad breakage in the translate code:

  1. should not add newline
  2. should not require a newline

I'd like to apologize to @Nikerabbit, my tone's been not cool on this thread. Taking out my frustration on you is not OK.

I think we can make some improvements in the short term to get VE and Translate to play a little nicer until we have something more VE-native to migrate to... Will break out into smaller bugs, probably on the VE end. If we basically treat a <Translate> like a <div> I think it should mostly not explode... I hope. :)

brooke renamed this task from <translate> extension is a usability nightmare for editing to <translate> extension usability issues for editing.Apr 1 2016, 6:01 PM
brooke updated the task description. (Show Details)

Brion, I understand very well why you feel frustrated about the markup and it might be actually helpful that you raise awareness of this issue.

All help is much appreciated as our team is very small and trying hard to find a balance between new feature development and maintaining and supporting existing features.

I marked the page for translation, now you can see it doesn't work anymore.

To clarify, the edit you saw was done automatically by the tool available to translation admins when I registered the page for translation. The tool automatically adds the T-comments and whitespace change you can see in the diff.

In T131516#2169804, @brion wrote:

I'd like to apologize to @Nikerabbit, my tone's been not cool on this thread. Taking out my frustration on you is not OK.

I don't think an apology is warranted. The Translate extension's syntax and general behavior is rage-inducing. Highlighting this frustration is completely legitimate, in my opinion.

These various usability issues contribute to why I'm so wary of seeing the Translate extension enabled on additional Wikimedia wikis. Wikitext is already scary and painful enough without this extension.

I mentioned this in-person to Brion, but sharing for a wider audience:

Ultimately, I don't think a string-based translation annotation system is ever going to be fully compatible with a DOM-based editor. We can add more and more hacks to try to paper over the differences, but there will always be instances and edge cases which are not plausibly fixable.

Niklas is entirely right; we need to work at full-speed on the proper DOM-level fragment concept in MediaWiki so that we can make translation a first-tier feature of MediaWiki. In the mean-time, we're left with a very unsatisfactory situation.

I don't think an apology is warranted. The Translate extension's syntax and general behavior is rage-inducing. Highlighting this frustration is completely legitimate, in my opinion.

This contribution is not useful, Max. If you do not wish to contribute in a positive way to this discussion, I suggest you spend your time on other things.

In T131516#2169498, @brion wrote:

I think that perhaps VE could just treat those tags as plaintext for now if that would cause less problems.

That would probably help a great deal.

Can a VisualEditor/Parsoid person split this actionable item to its own task, please?

In T131516#2169498, @brion wrote:

I think that perhaps VE could just treat those tags as plaintext for now if that would cause less problems.

That would probably help a great deal.

Can a VisualEditor/Parsoid person split this actionable item to its own task, please?

It's not actionable because it's already the case.

In T131516#2169498, @brion wrote:

I think that perhaps VE could just treat those tags as plaintext for now if that would cause less problems.

That would probably help a great deal.

Can a VisualEditor/Parsoid person split this actionable item to its own task, please?

It's not actionable because it's already the case.

Well, they're treated like extension blocks currently; you can edit the text within them (one block at a time) through a popup dialog but it's awkward. :)

I'm a bit unsure whether we need a full interface on T55974 or if we just need the <translate>...</translate> blocks to be treated as editable DOM spans rather than aliens that block editing.

These various usability issues contribute to why I'm so wary of seeing the Translate extension enabled on additional Wikimedia wikis. Wikitext is already scary and painful enough without this extension.

This. I get that there are technical issues too, but it's a serious problem. Would it be possible to get some Design input here on possible other approaches?

In T130567#2201177, I raised the notion that our issues with translation extension markup may be symptomatic of a much deeper issue to resolve, where we have a lot of tradeoffs to balance in pursuit of a more ideal solution.

@Nikerabbit defends the tradeoffs in the existing solution quite well, and more to the point, it's deployed, solving a problem and hasn't killed us yet. It's quite possible to make an argument that it is working well enough for us now, and that we have bigger problems we should solve.

I hope this doesn't introduce stop energy toward good incremental improvements that make <translate> markup more robust and simultaneously makes editing support smoother and easier. Is this an issue that has solutions close to the surface, do useful changes in this area require thinking about deeper parts of the system?

In the email thread I suggested to start prototyping an alternative solution that does not require explicit <translate> tags.

From Translate's end, what is needed:

  • Array of source text units to translate
  • A method that can take the source page (or information extracted from it) and translated array, to create a translation page composed of the translations.
  • A continuity in the array keys so that if edits are done the page, the keys should not change if paragraphs are moved or changed slightly in content.

Current implementation does the above by <translate> tags and T-comments, by pre-parsing the wikitext before the parser gets to it. The new system should ideally use heuristics with human augmentation. It can be based either on wikitext or parsoid output, but ideally at least for beginning, the translatable parts would be converted to wikitext as primary storage format. Support for visual translation can be added later incrementally, in my opinion.

Anyone trying to build the heuristics should look at Special:PagePreparation and the documentation at https://www.mediawiki.org/wiki/Help:Extension:Translate/Page_translation_administration. To get started, something like this should do:

  1. One section for each heading
  2. One section for each paragraph
  3. One section for each image caption
  4. One section for each list item (this would be an improvement over the current system actually)

This could then be augmented by the user in VisualEditor and wikitext by marking some elements as not translatable or translatable (e.g. make it possible to change image name for localisation). Parsoid would likely to be used to keep track of each part and Special:PageTranslation might need additional UI to correct mappings (old to new) when heuristics fail to detect changes properly.

Given the shift in focus of this report, I propose to change the summary to "Invent automatic segmentation in translation units without <translate> tags" aka the dark magic solution.

Well, they're treated like extension blocks currently; you can edit the text within them (one block at a time) through a popup dialog but it's awkward. :)

I'd still like a separate report for this. There seems to be some low hanging fruit.

Given the shift in focus of this report, I propose to change the summary to "Invent automatic segmentation in translation units without <translate> tags" aka the dark magic solution.

That can be one of the blockers for this task. It alone is not sufficient to solve the problem this task is about. Keep this task as tracker.

As a workaround, wouldn't it be possible to have an option to automatically enable translation on all pages in a namespace, without the need for <translate> tags?
I know this would not be suitable for a lot of wiki but it can help small wikis (including mine) that want to translate all of their content.

(I have no idea what is on topic in this report, so please forgive me if I'm going off topic.)

As a workaround, wouldn't it be possible to have an option to automatically enable translation on all pages in a namespace, without the need for <translate> tags?

The translate extension would still need to add unit markers.

I know this would not be suitable for a lot of wiki but it can help small wikis (including mine) that want to translate all of their content.

We certainly need to facilitate the mass-addition of pages for translation. Our first step was with page migration tools.

On a wiki where translation is needed only towards a handful languages or less, I can imagine a need to reduce the work of the translation admins even at the cost of additional work for translators. I think a first step doesn't need to be especially complex: we could add a special page which mass-marks pages for translation in a given namespace, adding <languages/>\n<translate> at the beginning and </translate> at the end on each of them and allowing to confirm or not.

Nemo_bis renamed this task from <translate> extension usability issues for editing to Reduce or eliminate the need for the user to touch <translate> tags and unit markers.Jun 24 2017, 8:55 PM

I added this issue to MediaWiki Stakeholders' Group/TechConf Input:

https://www.mediawiki.org/wiki/MediaWiki_Stakeholders%27_Group/TechConf_Input#Improved_Translate_-_VisualEditor_integration

Please add your endorsement if you like to make this issue more visible.

As far as I can predict, that markup is going to remain as long as wikitext is the primary format for content. [...] Once HTML content is better supported, we can consider using some heuristics to automatically detect translatable parts (with option for manual user invention where it goes wrong).

I deduce the main reason for those cursed <!--T--> tags is to give identifiers to translation units that the software can then use to track changes to them? So if someone writes "Hello world" and it gets translated to "Hola mundo", and then someone else changes the original to "Goodbye world", how does the software know to match "Hola mundo" to "Goodbye world" and tell the user that the translation needs to be updated? Enter <!--T--> tags, right?

Now, my question-idea is: can't we use a diff tool to "measure" the difference in bytes between any two strings and use that rather than <!--T--> tags to track if they're the same? So in the previous example, when someone changes "Hello world" to "Goodbye world", the software would hit a dedicated table in the database, search for all the translated strings (in the current page or even the entire wiki), run the diff function for each, and return the closest match as the "suggested translation", which the user can then choose to accept, modify, update, etc.

Is this what you meant by "heuristics"? I'm sure my suggestion is overly simplistic and naive, but maybe it fires some better idea?

Yes, but not for translations. The idea would be that we run similarity matching algorithm in the "mark page for translation" view. This would reduce the need for translation unit identifiers by increasing the burden on translation admin by some unknown amount.