Page MenuHomePhabricator

Code Stewardship Review: Collection Extension
Open, MediumPublic

Description

Intro

The Collection extension has been generating multiple production errors for over a year. Some of the extension's functionality has been extracted into the Proton service. The book creation aspect of this extension however has not.

Number, severity, and age of known and confirmed security issues

See https://phabricator.wikimedia.org/maniphest/query/xa3bDF3ygt2r/#R for those who have access to Security issues

Was it a cause of production outages or incidents? List them.

TBD

Does it have sufficient hardware resources for now and the near future (to take into account expected usage growth)?

n/a

Is it a frequent cause of monitoring alerts that need action, and are they addressed timely and appropriately?

Yes, there have been some ongoing errors in production (e.g. see T197797, T203594, T223742, T224443, T189636).

When it was first deployed to Wikimedia production

2008 or earlier, according to https://www.mediawiki.org/w/index.php?title=Extension%3ACollection&type=revision&diff=221705&oldid=217405

Usage statistics based on audience(s) served

TBD

Changes committed in last 1, 3, 6, and 12 months

12m: 65 commits
6m: 48 commits
3m: 27 commits

Reliance on outdated platforms (e.g. operating systems)

n/a

Number of developers who committed code in the last 1, 3, 6, and 12 months

12m: 11 authors
6m: 7 authors
3m: 6 authors

Number and age of open patches

See https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Collection

Number and age of open bugs

See https://phabricator.wikimedia.org/maniphest/query/zDqcGSnMZPl6/#R

Number of known dependencies?

TBD

Is there a replacement/alternative for the feature? Is there a plan for a replacement?

Unknown

Submitter's recommendation (what do you propose be done?)

None

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks @Aklapper and @Krinkle for the additional detail on this CSR.

greg triaged this task as Medium priority.Jul 3 2019, 10:29 PM

Collection is currently only a gateway to the PediaPress print-on-demand bookshop, which I imagine is not used much. AIUI the plan is for PediaPress to eventually provide PDF rendering functionality though. (Proton only renders single pages, it can't handle large pages, and the approach used is very different (headless Chrome vs. LaTeX generation) so they probably have different strengths and weaknesses.) Also there is a community effort to render books to PDF, mediawiki2latex, which probably deserves to be exposed at some point. So Collection is still valuable IMO.

The code is rather horrible (mostly just due to age) but it's not doing anything particularly complicated (no actual PDF rendering involved, it's just a frontend for building a book definition in session storage, exporting/importing to/from wiki pages, and sending it to some background service) and would not be hard to upgrade / rewrite.

I would endorse the comments by '''Tgr''' so far as they go, but it is important to remember that making a collection is a necessary prerequisite to pulling it form MediaWiki2LaTeX. Also, the collection extension was originally intended to build reading lists, whether for offline reading as a "book" or online reading simply as a more functional alternative to bookmarks/favourites lists. I do not know how much it is used for this, but the focus on "book" building does not make it obvious to inexperienced users. I would hope that we now have a good opportunity to revisit this.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May26 and Jun17, and T270544). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be very welcome!

(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

For everyone's info, currently no Code-Stewardship-Reviews are taking place as there is no clear path forward and as this is not prioritized work.
(Entirely personal opinion: I also assume lack of decision authority due to WMF not having a CTO currently. However, discussing this is off-topic for this task.)

So.. it's been a couple of years.. Any thoughts ?

I think we should strip out all the pediapress/print logic at the very least.

  • The book creator is disabled from some of the biggest wikipedia's: dewiki, enwiki, fawiki, fiwiki, nowiki, ruwiki
  • it hasn't been maintained since around 2014
  • the ui has references to downloads all over the place, but these downloads are not actually being provided for books. All downloads are done via Electron/Proton
  • it uses jquery ui
  • the JS style is very old and has lots of inline JS
  • it doesn't support dark mode

I see pediapress still 'sort of' works, but i'm unsure how they now get their contents. does anyone know ? [edit] Apparently we make some sort of bundle that we upload to their servers using $wgCollectionMWServeURL

[edit 2]: It seems the book creator link was disabled by dewiki in 2021, because pediapress was NOT working. It seems that since that time, it has started working again ?

The Book Creator has not been disabled anywhere. It's been hidden from the sidebar, but still works if you know about it I think. But that, and the decision to delete the book namespace, were both much more hard-fought than you think.

And there's also https://mediawiki2latex.wmcloud.org/

I personally think the entire collection extension should be undeployed and archived at this point. But it appears that at least some segment of the community still sees value in it, as comments like https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)/Archive_177#c-Steelpillow-2021-03-16T17:35:00.000Z-Aza24-2021-03-13T06:48:00.000Z show.

The Book Creator has not been disabled anywhere. It's been hidden from the sidebar, but still works if you know about it I think. But that, and the decision to delete the book namespace, were both much more hard-fought than you think.

And there's also https://mediawiki2latex.wmcloud.org/

I personally think the entire collection extension should be undeployed and archived at this point. But it appears that at least some segment of the community still sees value in it, as comments like https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)/Archive_177#c-Steelpillow-2021-03-16T17:35:00.000Z-Aza24-2021-03-13T06:48:00.000Z show.

Well having a stable thing to download collections and books is a long time wish by the community (esp wikisource). But "anticipated future solutions" are not particularly materializing without investment of the WMF or a volunteer and neither seem to be popping up.

If a community need a book namespace, it can be defined in MediaWiki config. Alternatively, 3rd alternative tools of Collection may use a page in a different namespace (such as project namespace) to define books.

We may also move the PediaPress feature to a toolforge tool, if it still works.

Drive by comment to say that pediapress PDF,EPUB, etc rendering has probably been broken for a very long time and no-one noticed. See T374888.

Change #1075157 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[mediawiki/extensions/Collection@master] Remove render_article, render_collection commands

https://gerrit.wikimedia.org/r/1075157

Change #1075157 merged by jenkins-bot:

[mediawiki/extensions/Collection@master] Remove render_article, render_collection commands

https://gerrit.wikimedia.org/r/1075157

I have interest in taking stewardship of this extension. As I mentioned elsewhere, I'm interested in restoring some of the book rendering capabilities of this extension. I'd like to maintain these functions for people that need them in a way that they don't conflict with the needs of the WMF.

I have interest in taking stewardship of this extension. As I mentioned elsewhere, I'm interested in restoring some of the book rendering capabilities of this extension. I'd like to maintain these functions for people that need them in a way that they don't conflict with the needs of the WMF.

A thing to consider is even if it is maintained, is this extension worth to be kept in WMF production? A Toolforge/Cloud VPS tool may be easy to maintain (do not need to use PHP) and have better security segmentation.