A number of methods of ParserOutput currently take a Title or Title-related class, like LinkTarget, as arguments. This leads to two problems:
- Parsoid can not directly define an API with these methods, because the Title class is part of core and Parsoid can't reference it without introducing a circular dependency between core and Parsoid. Further the Title class is too enmeshed in core to allow easy refactoring into an independent library.
- Eager resolution of strings into Titles causes inefficient database churn. It is preferable to use types which either avoid requiring database queries entirely, or to the extent they are required (for example, to look up the article ID associated with a title), perform such queries in batches.
'''OLD PLAN:'''
To solve both of these issues we propose to introduce a LazyLinkTarget class, and accept LazyLinkTarget|LinkTarget as arguments to ParserOutput. We will attempt to defer resolution of LazyLinkTarget as long as possible, so that it is possible to do so in a batch only as needed.
We'll also introduce a "LazyLinkTarget" factory type in Parsoid, which turns strings (or ints, see comment below) into "LazyLinkTarget" objects. This can be a marker interface in standalone Parsoid, since Parsoid never needs to be able to interrogate a LazyLinkTarget in any way. In core, the LazyLinkTarget factory would be instantiated in such a way that title resolution occurs in batches.
Note that, for instance, ParserOutput::addLink() ultimately does:
if ( $id === null ) { $page = MediaWikiServices::getInstance()->getPageStore()->getPageForLink( $link ); $id = $page->getId(); } $this->mLinks[$ns][$dbk] = $id;
Part of this task is changing mLinks so that it stores LazyLinkTargets as well, so that we do the getPageForLink()->getId() only at the last possible moment and with a full batch of link targets at once.
(comments re the above code in particular: "The whole thing with optional $id is weird. It can be 0 since not all links can have a page ID, but if it's skipped we do a DB lookup later. it's a foot-gun - use this in any kind of a loop without the ID provided and you end up killing performance completely.")
'''NEW PLAN''': We'll just move the LinkTarget interface from core to Parsoid. Parsoid can then pass LinkTargets into all the ContentMetadataCollector/ParserOutput methods. We still need to lazily evaluate them to pageids in many cases, but that will be handled by ParserOutput and hidden from Parsoid.