Page MenuHomePhabricator

Implement MVP of OCR in Wikisource extension
Closed, ResolvedPublic5 Estimated Story Points

Description

The new UI is not part of our ordinary UX therefore the true MVP just requires:
Extract button overlaying the image points to Wikimedia OCR
& Auto detecting language

Event Timeline

Change 682034 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] Add basic OCR button for Wikimedia OCR

https://gerrit.wikimedia.org/r/682034

I've started looking at this and have a basic functioning system with two buttons in the tool bar (one for each engine).

I think we need a few things before this can proceed though:

  • The language code issue needs to be sorted out (T280617). Even if we just want to send a single language, matching the wiki's content language (which is what the current tools do), we still need to map these to the codes that the engines accept. The above patch just leaves out the language all together, leaving it up to the engine's autodetection to sort it out; this works reasonably well for some languages (and fixing it can perhaps wait till the other stuff is done).
  • We probably want to adapt the tool to accept images coming from somewhere other than upload.wikimedia.org, so that developers can run both the Wikisource extension and the tool locally and have them talk to each other. I've created T280953 for this.
  • From out discussion the other day, it sounds like we might end up with some variation on the idea of having two buttons: one for OCR and one for the configuration. Whether these are in the toolbar or floating on the image I don't know, but it's easiest to start with them being in the toolbar. We do probably need some icons for them though (I just used the generic article icon in the patch above, so there's two matching icons and you have to hover to figure out which one is which).
  • What will the loading and error states look like? The simplest thing is to disable the textarea while the OCR is running, and reenable it when the text is entered. Is that sufficient for a first patch? It means we don't need any messages or new UI elements.

I've started looking at this and have a basic functioning system with two buttons in the tool bar (one for each engine).

I think we need a few things before this can proceed though:

  • The language code issue needs to be sorted out (T280617). Even if we just want to send a single language, matching the wiki's content language (which is what the current tools do), we still need to map these to the codes that the engines accept. The above patch just leaves out the language all together, leaving it up to the engine's autodetection to sort it out; this works reasonably well for some languages (and fixing it can perhaps wait till the other stuff is done).

-> Down to wait until other higher project priorities are done, will hold off on that call until we estimate complexity of 617. Thanks for making it Sam!

  • We probably want to adapt the tool to accept images coming from somewhere other than upload.wikimedia.org, so that developers can run both the Wikisource extension and the tool locally and have them talk to each other. I've created T280953 for this.

-> Can someone walk me through the flow for this, a bit confused about what hosting it on the extension means. Do we have mocks for this extension UI and if so can folks direct me to them? 😅cc @nayoub @ifried

  • From out discussion the other day, it sounds like we might end up with some variation on the idea of having two buttons: one for OCR and one for the configuration. Whether these are in the toolbar or floating on the image I don't know, but it's easiest to start with them being in the toolbar. We do probably need some icons for them though (I just used the generic article icon in the patch above, so there's two matching icons and you have to hover to figure out which one is which).
  • What will the loading and error states look like? The simplest thing is to disable the textarea while the OCR is running, and reenable it when the text is entered. Is that sufficient for a first patch? It means we don't need any messages or new UI elements.

-> Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Can someone walk me through the flow for this, a bit confused about what hosting it on the extension means. Do we have mocks for this extension UI and if so can folks direct me to them?

So far we've only been working on the tool, and testing it with images hosted on Commons which are served from upload.wikimedia.org. Now we're going to be working on the front-end in the Wikisource extension, we want to have both the tool and the extension running locally, and for them to talk to each other. This means the existing validation of image URLs fails, and so needs to be amended to also include whatever domain name is in use locally (often localhost but might be e.g. wikimedia-ocr.local). It also means that the tool has to accept cross-origin requests from the extension (these are two different issues, but might as well be done together in T280953).

Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Sounds like Nicolas will figure out the details here. For the first patch, I'll stick with only disabling the textarea.

ldelench_wmf renamed this task from [PLACEHOLDER] Implement OCR in Wikisource extension to Implement OCR in Wikisource extension.Apr 29 2021, 11:38 PM
ldelench_wmf set the point value for this task to 5.
NRodriguez renamed this task from Implement OCR in Wikisource extension to Implement MVP of OCR in Wikisource extension.Apr 29 2021, 11:39 PM

Ready for review: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikisource/+/682034 (dependent on https://github.com/wikimedia/wikimedia-ocr/pull/24 ).

It adds the button, defaulting to no language (auto-detecting) and Tesseract.

Also adds a $wgWikisourceEnableOcr feature flag, in addition to a configurable tool URL $wgWikisourceOcrUrl.

Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Sounds like Nicolas will figure out the details here. For the first patch, I'll stick with only disabling the textarea.

FWIW, my experience with Phe's OCR tool (whose gadget just sets .disable() on #wpTextBox1) is that this is not particularly intuitive for end users. When there's a failure they describe it as the OCR button "greying out the text", and if it is slow it is hard for them to understand what the state of the text box is and what is happening. i.e. just using .disable() is a reasonable fallback (if a widget isn't available or too expensive to adopt), but not at all an optimal user experience.

This kind of thing is also a fairly obvious reusable component that should exist in whatever UI widget system ends up being used: a few years ago as a jQuery UI plugin, up until last year as a widget in OOUI, and after the adoption of Vue.js as whatever-the-actual-UI-widget-story-will-be (Bootstrap?). Logically "disabling" a DOM element representing some visible part of the page, with default and optionally custom styling, and plopping a loading indicator over it is relevant in lots of scenarios. As someone writing Gadgets and user scripts on enWS I've needed this for both my own OCR toy and various tools manipulating the header and footer fields. This is also the same general behaviour of modal dialog boxes (who need to both visually and functionally prevent interaction with the page "below" the dialog) and progress bars of various stripes.

FWIW, my experience with Phe's OCR tool (whose gadget just sets .disable() on #wpTextBox1) is that this is not particularly intuitive for end users.

Absolutely, I totally agree! This isn't the final design at all, it's just the simplest thing to get to the next step; it'll probably never actually be deployed like this. @nayoub is working on figuring out a better system, with a cancel button and spinner etc.

Change 682034 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Add basic OCR button for Wikimedia OCR

https://gerrit.wikimedia.org/r/682034

dom_walden subscribed.

This code change has already been tested on beta here: T282080#7085163.

Moving straight into Product sign-off.