Implement MVP of OCR in Wikisource extension
Closed, ResolvedPublic5 Estimated Story Points
Actions

Description

The new UI is not part of our ordinary UX therefore the true MVP just requires:
Extract button overlaying the image points to Wikimedia OCR
& Auto detecting language

Details

	Subject	Repo	Branch	Lines +/-
	Add basic OCR button for Wikimedia OCR	mediawiki/extensions/Wikisource	master	+228 -11

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Samwilson	T280848 Implement MVP of OCR in Wikisource extension
		Resolved		Samwilson	T280953 Allow images and requests to come from localhost

Event Timeline

HMonroy created this task.Apr 21 2021, 8:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 21 2021, 8:11 PM

JJMC89 added a project: Wikimedia OCR.Apr 21 2021, 8:12 PM

HMonroy mentioned this in T275547: Wikisource OCR: Move Wikimedia OCR gadget to Wikisource extension.Apr 21 2021, 8:13 PM

HMonroy mentioned this in T280580: Add functional staging support for Wikimedia OCR.Apr 21 2021, 8:17 PM

Samwilson mentioned this in T280212: Wikisource OCR: Tesseract OCR gadget button to Wikisource extension.Apr 23 2021, 1:58 AM

Change 682034 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Wikisource@master] Add basic OCR button for Wikimedia OCR

https://gerrit.wikimedia.org/r/682034

gerritbot added a project: Patch-For-Review.Apr 23 2021, 5:53 AM

I've started looking at this and have a basic functioning system with two buttons in the tool bar (one for each engine).

I think we need a few things before this can proceed though:

The language code issue needs to be sorted out (T280617). Even if we just want to send a single language, matching the wiki's content language (which is what the current tools do), we still need to map these to the codes that the engines accept. The above patch just leaves out the language all together, leaving it up to the engine's autodetection to sort it out; this works reasonably well for some languages (and fixing it can perhaps wait till the other stuff is done).
We probably want to adapt the tool to accept images coming from somewhere other than upload.wikimedia.org, so that developers can run both the Wikisource extension and the tool locally and have them talk to each other. I've created T280953 for this.
From out discussion the other day, it sounds like we might end up with some variation on the idea of having two buttons: one for OCR and one for the configuration. Whether these are in the toolbar or floating on the image I don't know, but it's easiest to start with them being in the toolbar. We do probably need some icons for them though (I just used the generic article icon in the patch above, so there's two matching icons and you have to hover to figure out which one is which).
What will the loading and error states look like? The simplest thing is to disable the textarea while the OCR is running, and reenable it when the text is entered. Is that sufficient for a first patch? It means we don't need any messages or new UI elements.

In T280848#7028658, @Samwilson wrote:

I've started looking at this and have a basic functioning system with two buttons in the tool bar (one for each engine).

I think we need a few things before this can proceed though:

The language code issue needs to be sorted out (T280617). Even if we just want to send a single language, matching the wiki's content language (which is what the current tools do), we still need to map these to the codes that the engines accept. The above patch just leaves out the language all together, leaving it up to the engine's autodetection to sort it out; this works reasonably well for some languages (and fixing it can perhaps wait till the other stuff is done).

-> Down to wait until other higher project priorities are done, will hold off on that call until we estimate complexity of 617. Thanks for making it Sam!

We probably want to adapt the tool to accept images coming from somewhere other than upload.wikimedia.org, so that developers can run both the Wikisource extension and the tool locally and have them talk to each other. I've created T280953 for this.

-> Can someone walk me through the flow for this, a bit confused about what hosting it on the extension means. Do we have mocks for this extension UI and if so can folks direct me to them? 😅cc @nayoub @ifried

From out discussion the other day, it sounds like we might end up with some variation on the idea of having two buttons: one for OCR and one for the configuration. Whether these are in the toolbar or floating on the image I don't know, but it's easiest to start with them being in the toolbar. We do probably need some icons for them though (I just used the generic article icon in the patch above, so there's two matching icons and you have to hover to figure out which one is which).

What will the loading and error states look like? The simplest thing is to disable the textarea while the OCR is running, and reenable it when the text is entered. Is that sufficient for a first patch? It means we don't need any messages or new UI elements.

-> Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Samwilson added a subtask: T280953: Allow images and requests to come from localhost.Apr 29 2021, 1:07 AM

Can someone walk me through the flow for this, a bit confused about what hosting it on the extension means. Do we have mocks for this extension UI and if so can folks direct me to them?

So far we've only been working on the tool, and testing it with images hosted on Commons which are served from upload.wikimedia.org. Now we're going to be working on the front-end in the Wikisource extension, we want to have both the tool and the extension running locally, and for them to talk to each other. This means the existing validation of image URLs fails, and so needs to be amended to also include whatever domain name is in use locally (often localhost but might be e.g. wikimedia-ocr.local). It also means that the tool has to accept cross-origin requests from the extension (these are two different issues, but might as well be done together in T280953).

Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Sounds like Nicolas will figure out the details here. For the first patch, I'll stick with only disabling the textarea.

• NRodriguez updated the task description. (Show Details)Apr 29 2021, 11:36 PM

• NRodriguez updated the task description. (Show Details)

ldelench_wmf renamed this task from [PLACEHOLDER] Implement OCR in Wikisource extension to Implement OCR in Wikisource extension.Apr 29 2021, 11:38 PM

ldelench_wmf set the point value for this task to 5.

• NRodriguez renamed this task from Implement OCR in Wikisource extension to Implement MVP of OCR in Wikisource extension.Apr 29 2021, 11:39 PM

Ready for review: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikisource/+/682034 (dependent on https://github.com/wikimedia/wikimedia-ocr/pull/24 ).

It adds the button, defaulting to no language (auto-detecting) and Tesseract.

Also adds a $wgWikisourceEnableOcr feature flag, in addition to a configurable tool URL $wgWikisourceOcrUrl.

In T280848#7044278, @Samwilson wrote:

Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Sounds like Nicolas will figure out the details here. For the first patch, I'll stick with only disabling the textarea.

FWIW, my experience with Phe's OCR tool (whose gadget just sets .disable() on #wpTextBox1) is that this is not particularly intuitive for end users. When there's a failure they describe it as the OCR button "greying out the text", and if it is slow it is hard for them to understand what the state of the text box is and what is happening. i.e. just using .disable() is a reasonable fallback (if a widget isn't available or too expensive to adopt), but not at all an optimal user experience.

This kind of thing is also a fairly obvious reusable component that should exist in whatever UI widget system ends up being used: a few years ago as a jQuery UI plugin, up until last year as a widget in OOUI, and after the adoption of Vue.js as whatever-the-actual-UI-widget-story-will-be (Bootstrap?). Logically "disabling" a DOM element representing some visible part of the page, with default and optionally custom styling, and plopping a loading indicator over it is relevant in lots of scenarios. As someone writing Gadgets and user scripts on enWS I've needed this for both my own OCR toy and various tools manipulating the header and footer fields. This is also the same general behaviour of modal dialog boxes (who need to both visually and functionally prevent interaction with the page "below" the dialog) and progress bars of various stripes.

FWIW, my experience with Phe's OCR tool (whose gadget just sets .disable() on #wpTextBox1) is that this is not particularly intuitive for end users.

Absolutely, I totally agree! This isn't the final design at all, it's just the simplest thing to get to the next step; it'll probably never actually be deployed like this. @nayoub is working on figuring out a better system, with a cancel button and spinner etc.

Samwilson mentioned this in T282050: Wikisource OCR: add loading state improvements.May 6 2021, 12:29 AM

Change 682034 merged by jenkins-bot:

[mediawiki/extensions/Wikisource@master] Add basic OCR button for Wikimedia OCR

https://gerrit.wikimedia.org/r/682034

ReleaseTaggerBot added a project: MW-1.37-notes (1.37.0-wmf.5; 2021-05-11).May 6 2021, 6:00 AM

Samwilson moved this task from Review/Feedback 💬 to QA 🐛 on the Community-Tech (CommTech-Sprint-1) board.May 27 2021, 2:12 AM