Page MenuHomePhabricator

Page previews can consume new summary-HTML endpoint
Closed, ResolvedPublic3 Estimated Story Points

Assigned To
Authored By
Jdlrobson
May 11 2017, 8:48 AM
Referenced Files
F8532497: image.png
Jun 26 2017, 4:55 PM
F8532495: image.png
Jun 26 2017, 4:55 PM
F8516121: Screen Shot 2017-06-23 at 7.54.27 PM.png
Jun 23 2017, 5:55 PM
F8508603: image.png
Jun 22 2017, 9:46 PM
F8508617: image.png
Jun 22 2017, 9:46 PM
F8508627: image.png
Jun 22 2017, 9:46 PM
F8508599: image.png
Jun 22 2017, 9:46 PM
F8508609: image.png
Jun 22 2017, 9:46 PM

Description

T165017 introduces a new endpoint that returns HTML. It should be possible for page previews to consume this new API.

AC

QA Notes

@ABorbaWMF (or whoever) make sure you refer to T165018#3354869 before submitting any new bug reports as they may have already been written up. This task is about bug discovery and not fixing them all at once, which'll require larger architectural changes.

Post-sign-off tasks

Dev Notes

  • This'll require a new gateway as we'll want to test this in isolation. The wgPopupsAPIUseRESTBase configuration variable controls whether or not to consume the RESTBase Page Summary API. Perhaps this configuration variable needs revisiting, e.g. it could be superseded by a variable that can have the values "mediawiki", "restbase", and "restbase-experimental"?
    • Alternatively the endpoint can be configurable e.g. wgPopupsApiRESTBaseEndpoint and the new endpoint can simply be used with wgPopupsAPIUseRESTBase
  • Currently the renderer does additional processing on the plain text extract that it gets from the preview model (see ext.popups.renderer#renderExtract), this could be extracted to the preview model, i.e. ext.popups.PreviewModel#getHtml. The new gateway could return a specialised instance of ext.popups.PreviewModel.

Related Objects

StatusSubtypeAssignedTask
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
Resolvedovasileva
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
DuplicateJdlrobson
ResolvedMhurd
Declined JMinor
Resolvedphuedx
Resolved Pchelolo
ResolvedJdlrobson
Declined Pchelolo
Resolvedphuedx
DeclinedJdlrobson
DuplicateNone
Resolved Fjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
Resolved Pchelolo
Resolved bearND
Resolved Mholloway
ResolvedMSantos
Resolved Mholloway
InvalidNone
ResolvedJdlrobson
InvalidNone
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
Resolved bearND
Resolved Mholloway
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
ResolvedJdlrobson
Resolved bearND
ResolvedJdlrobson
Resolved Mholloway
Resolved Mholloway
ResolvedJdlrobson
ResolvedJdlrobson
Resolved bearND

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 358415 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Remove unused wgPopupsAPIUseRESTBase config variable

https://gerrit.wikimedia.org/r/358415

Quoting myself from my review of rEPOP8d2b7212c03c: Handle Restbase HTML response:

I'd like to chew over the pattern of exposing the extract parser as part of the gateway object level rather than exposing it at the factory level. I'm not sure that I can capture my thinking in Gerrit though. I'll comment on the associated Phab task.

The gateway factories should return objects whose shapes follow the *Gateway interfaces.

Prior to your change, we seem to have exposed the internals of the gateway modules as part of the objects returned by the top-level factory function, e.g.

module.exports = factory( fetch ) {
  return {
    interfaceFn: function () {
      return fetch( 'Foo' )
        .then( helper );
    },
    helper: helper
  };
};

function helper() {
  return 'bar';
}

We then exercise helper directly rather than via the interfaceFn method in our unit tests.

As part of rEPOP8d2b7212c03c: Handle Restbase HTML response, we've continued this pattern for the MediaWiki API gateway but tidied up the RESTBase gateways. We should apply the same refactoring to all of them.

This is outside of the scope of your change but it might be worth a @TODO.

Change 357808 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Implement html/rest.js gateway which handles HTML Restbase responses

https://gerrit.wikimedia.org/r/357808

My favourite screenshot from testing @pmiazga's change locally – in a "wow, that's nice!" sorta way:

Screen Shot 2017-06-14 at 16.06.04.png (431×321 px, 92 KB)

/cc @ovasileva @Nirzar

@ABorbaWMF I'll assign this to you when the Beta Cluster and the staging server have been configured for you to test this change on.

Change 359123 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[operations/mediawiki-config@master] pagePreviews: Consume HTML from RESTBase endpoint

https://gerrit.wikimedia.org/r/359123

My favourite screenshot from testing @pmiazga's change locally – in a "wow, that's nice!" sorta way:

Screen Shot 2017-06-14 at 16.06.04.png (431×321 px, 92 KB)

/cc @ovasileva @Nirzar

:D

Change 359123 merged by jenkins-bot:
[operations/mediawiki-config@master] pagePreviews: Consume HTML from RESTBase endpoint

https://gerrit.wikimedia.org/r/359123

^ I can confirm that PP is rendering the HTML part of the RESTBase response on the Beta Cluster.

On reading web staging when I hover over a link I see a text extract in HTML as provided by the new [summary] endpoint.

@ovasileva: Since there's an instance of RESTBase running on the Beta Cluster, I've opted to do this there. Let me know if this is the wrong way round.

phuedx added a subscriber: pmiazga.

@ABorbaWMF: This is now available to test on the Beta Cluster. This is both a general change to the layout of the codebase as well as a change to where the content of the previews come from, you should check that there aren't any regressions. I guess @ovasileva and @Nirzar are going to be looking at the test articles in the description.

Change 359203 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Capture jQuery at construction

https://gerrit.wikimedia.org/r/359203

Change 359204 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Move createGateway to gateway/index.js

https://gerrit.wikimedia.org/r/359204

Change 359205 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Fix the npm script test:node

https://gerrit.wikimedia.org/r/359205

Change 359206 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Simplify gateways

https://gerrit.wikimedia.org/r/359206

Change 359207 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Rename builder vars on require preview/model

https://gerrit.wikimedia.org/r/359207

I've submitted some hygiene changes related to this patch. No change in functionality should occur.

Testing on the Beta cluster with the Dog article. I noticed problems with some of the previews. I'm not sure what is common among these preview types, but they are appearing as blank windows. Check out the video:

Change 359203 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Capture jQuery at construction

https://gerrit.wikimedia.org/r/359203

@ABorbaWMF thank you for your finding. The problem is that the TextExtracts return

<p>(Blank.)</p>

(Blank.) gets stripped and empty <p> stays. The content is not empty as there is an empty <p> tag. I'll add a logic to show a generic preview when preview contains only empty elements (like p, b), no text nodes.

This comment was removed by ovasileva.

Nice spot @ABorbaWMF! This is a genuine bug.

Devs: the HTML returned in a response from RESTBase is wrapped in a p. All of the articles that produce empty previews in @ABorbaWMF's video have the same content: "(Blank.)", meaning that the extract_html property of the RESTBase response is '<p>(Blank.)</p>' and '<p></p>' after processing, i.e. it ain't empty.

I think this should be fixed by stripping the wrapping p wrapper element from the extract_html property as it causes this issue and also messes with the padding/margins of the preview.

Screen Shot 2017-06-16 at 1.31.29 PM.png (301×399 px, 54 KB)

We have math! And it's looking pretty nice

@Nirzar - the last line is a bit cut off still. Thoughts?

Screen Shot 2017-06-16 at 1.33.50 PM.png (283×360 px, 57 KB)

@Nirzar - the last line is a bit cut off still. Thoughts?

@ovasileva: This might be a result of not stripping the wrapper p that I mentioned in T165018#3354254.

Change 359204 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Move createGateway to gateway/index.js

https://gerrit.wikimedia.org/r/359204

I think this should be fixed by stripping the wrapping p wrapper element from the extract_html property as it causes this issue and also messes with the padding/margins of the preview.

We also have to handle the extract containing more than one p, when the first one possibly being empty.

giant comment with all current issues I've found. Let's discuss first before we open separate bugs

Multiple bold elements in preview

Steps to reproduce

  1. go to: https://en.wikipedia.beta.wmflabs.org/wiki/Dog
  2. Hover over “Canidae”

Expected behavior: Canidae should be bolded
Observed behavior: Canidae and canid are bolded

NOTE: this behavior is acceptable, @ovasileva to add to requirements

Certain templates are not stripped from previews

Steps to reproduce

  1. go to: https://en.wikipedia.beta.wmflabs.org/wiki/Dog
  2. Hover over “Dog anatomy”

Expected behavior: preview reads “Dog anatomy….”
Observed behavior: expand language template appears

Similar behavior when hovering over “great dane”

Parentheticals stripped improperly

Steps to reproduce

  1. go to: https://en.wikipedia.beta.wmflabs.org/wiki/Planet
  2. Hover over “India”

Observed behavior: script error is stripped, but the parentheses remain

Extract cut off/too long

NOTE: I think this is the same as T165018#3354714 Steps to reproduce
  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Birdman_(film)
  2. Hover over Alejandro G. Iñárritu

Observed behavior:

Screen Shot 2017-06-16 at 2.34.36 PM.png (253×376 px, 52 KB)

Expected behavior: last line of text is not cut off/does not appear

Horizontal gradient appearing incorrectly

Steps to reproduce

  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Category:Living_people
  2. Hover over “Andrew Miller”

Expected behavior: Horizontal gradient does not appear
Observed behavior: top of horizontal gradient overlaps text

Screen Shot 2017-06-16 at 3.13.24 PM.png (422×392 px, 143 KB)

Screen Shot 2017-06-16 at 3.12.08 PM.png (260×355 px, 50 KB)

Extracts for lists displaying partial previews

Steps to reproduce

  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Marko_Saaresto
  2. Hover over “Finnish”

Expected: generic preview
Observed: preview ends with the first paragraph

NOTE: this isn't technically a bug, but might as well fix it now. I think the best way would be to display the generic preview for all extracts that end in a colon

Previews do not concatenate multiple paragraphs

Steps to reproduce

  1. Page: https://en.wikipedia.beta.wmflabs.org/wiki/Category:Articles_with_%27species%27_microformats
  2. Hover over “european mink”

Observed behavior: multiple paragraphs (also broken length and gradient)

Screen Shot 2017-06-16 at 3.24.19 PM.png (456×414 px, 202 KB)

Expected behavior: paragraphs should be concatenated
Screen Shot 2017-06-16 at 3.28.51 PM.png (477×371 px, 198 KB)

Welcome to the land of HTML and scope increase!

Extract cut off/too long
Previews do not concatenate multiple paragraphs

These are the same problem.
@ovasileva paragraph concatentation is going to be very difficult (i'm not sure possible?) HTML elements themselves have different margins/line heights so the approach we have right now to fix it with fixed heights will not work

Extracts for lists displaying partial previews

This specific problem is an issue with disambiguation pages and tracked in T130671.
If you take a look at https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland they are ordered by headings. There is no list and we can't show every single page.

Horizontal gradient appearing incorrectly

looks like we need to reopen T165974

Certain templates are not stripped from previews

This is an editor problem and we already have a mechanism to easily fix this: https://www.mediawiki.org/wiki/Extension:Popups#How_can_I_remove_content_from_a_page_preview.3F

Multiple bold elements in preview

Welcome to HTML :) Note HTML can obviously contain bold items. On top of this we try to bold the title of the page we are linking to. IMO it's dangerous to unbold things as they might be bolded for a good reason, but we could update the TextExtracts to strip bold elements.. although we should think about that very very carefully. TextExtracts API is used for many purposes.

Parentheticals stripped improperly

This is because we are doing something very hacky to remove them. All this is captured in T91344.

Multiple bold elements in preview

There is no logic for bolding title elements. If in orignal text extract Title is not bolded the HTML preview won't show bold titles. If we want to make Title bold also in HTML previews we have to implement that. Now the only processing applied to the extract is strip parenthesis and removing trailing ellipsis. Nothing else.

Certain templates are not stripped from previews

This shouldn't happen on production as beta cluster doesn't have all templates. Those extra texts are visible only because template parser failed.

Extract cut off/too long

This one is tricky as HTML can hold styling and add some margins to text. Probably the best solution would be to add a gradient on the bottom so preview always smoothly transition to transparent.

Horizontal gradient appearing incorrectly

look comment above

Extracts for lists displaying partial previews

This is a general error (happens also in plain text extract). This is the content we retrieve from TextExtracts API, The following is a list of bands from Finland: is everything we got. Please file a separate ticket for this one.

Previews do not concatenate multiple paragraphs

This one is probably most tricky. We can concatenate paragraphs in Frontend but in HTML world we every element can hold some styling. This is a standalone task.

This one is tricky as HTML can hold styling and add some margins to text. Probably the best solution would be to add a gradient on the bottom so preview always smoothly transition to transparent.

We need to think hard about fixing this. Can we strip out style elements? I want to avoid vertical gradient for this.

@Nirzar We can strip all style/cass attributes and we additionally can define a default styles for every tag (ul/ol/p/img) inside page preview.

By default we strip those elements:

			"table",
			"div",
			"style",
			"ul.gallery",
			".mw-editsection",
			"sup.reference",
			"ol.references",
			".error",
			".nomobile",
			".noprint",
			".noexcerpt",
			".sortkey"

@pmiazga what are the elements we allow? only ul/ol/p

what about b, strong?

TextExtracts allows everything else other than the list plus wikimedia-config adds couple more: .metadata and 'pan.coordinates, span.geo-multi-punct, span.geo-nondefault' #coordinates
Everything other stays - so <b>, <strong>, <i>, <inderline>, <span>, <ol>, <ul> even <img> and others are allowed.

Change 359206 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Simplify gateways

https://gerrit.wikimedia.org/r/359206

Change 359207 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Rename builder vars on require preview/model

https://gerrit.wikimedia.org/r/359207

@ovasileva - I have an idea for multiple paragraphs, much easier way would be stripping everything from the second paragraph. We would show only the first paragraph and proceeding elements that are not paragraphs (for example Planet - we would show first paragraph and a list).

pmiazga updated the task description. (Show Details)

Change 359954 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] Do not show empty HTML previews

https://gerrit.wikimedia.org/r/359954

@pmiazga - re: paragraphs - I think that makes sense

@phuedx - for empty html previews - for now, I'm not removing <p> element as we might do some more advanced crunching later. I decided to use $().text() to look for text existence. With that approach I can remove both simple structures like <p></p> and bit more complex like <p><ol><li></li></ol></p>.

Now I'm wondering, are there any text extracts that contain an image only? eg <img src="..." />. My solution will render a generic preview instead of an image.

just documenting, we decided to use the generic preview for image-only previews during standup.

I've -1ed https://gerrit.wikimedia.org/r/#/c/359954/2 on the basis that we should be very disciplined about removing the client side hackery and move this logic into the service. I'd like a few more thoughts about whether I'm being unreasonable.

I don't think this blocks completing this task however. The sign off criteria has been met...

I've -1ed https://gerrit.wikimedia.org/r/#/c/359954/2 on the basis that we should be very disciplined about removing the client side hackery and move this logic into the service. I'd like a few more thoughts about whether I'm being unreasonable.

I think this is as good a place as any to draw a line in the sand. This brief test has unearthed a bunch of issues that need careful consideration – least of all, us writing down a definition of what constitutes "a preview" – and any feeling that you're being unreasonable is likely misdirected frustration that we haven't given ourselves much time to plan this work much further than this.

@ovasileva: Were we expecting @ABorbaWMF to do more QA on this? If so, then I'll move this task back to Needs QA.

phuedx updated the task description. (Show Details)

@phuedx - yup, I think @ABorbaWMF will do some more exploratory qa (if any bugs are found, they'll be filed separately however)

Looks mostly fixed. I have been testing on beta using the pages listed above by @ovasileva and some other pages as well. I tried on Chrome, Firefox, Safari, Edge, and Opera and on Mac and Windows.

Results:
Lists cut-off at the bottom (new bug?)

image.png (768×1 px, 250 KB)

Multiple bold Items looks fine
image.png (480×854 px, 332 KB)

Templates (still happens)
image.png (480×854 px, 154 KB)

Parenthesis (still happens)
image.png (768×1 px, 352 KB)

Long Extract looks fine
image.png (480×854 px, 119 KB)

Horizontal Gradient/Andrew Miller looks fine
image.png (480×854 px, 142 KB)

image.png (480×854 px, 57 KB)

Extracts for lists displaying partial previews (may be broken? list items are missing)
image.png (768×1 px, 379 KB)

Concatenate multiple paragraphs (not exactly like expected, but looks good)
image.png (768×1 px, 321 KB)

Moving this to Ready for Signoff as @ABorbaWMF's hasn't discovered any issues that we didn't already know about – this is a good thing, right?

@ABorbaWMF - did you get a chance to test through multiple browsers?

Formulas don't seem to be appearing in Chrome:

Screen Shot 2017-06-23 at 7.54.27 PM.png (278×397 px, 50 KB)

@ABorbaWMF - final check, could you go over these in across a few different browsers?

  • Article: Lie group, hover over lie algebra, formulas should appear in preview
  • Article: Earth, hover over planet, list should appear
  • Article: Berlin, hover over Germany

@ovasileva, yes I tried multiple browsers. What article are you using for the math page previews?

@ABorbaWMF - lie group, hovering over lie algebra. @Nirzar first noticed this, I could replicate as well

Sorry I meant to hit submit earlier on this one. I am also seeing the issue with math formulas, but only on some browsers/systems.

Looks good on:
Mac - Safari
Win or Mac - Firefox

image.png (480×854 px, 166 KB)

Looks bad on:
Mac or Win - Chrome
Mac or Win - Opera
Win - IE
Win - Edge

image.png (768×1 px, 356 KB)

Change 359954 abandoned by Pmiazga:
Do not show empty HTML previews

Reason:
Instead of moving more logic into frontend we decided to create a service in the backend to crunch page previews data. With that approach this change is not necessary.

https://gerrit.wikimedia.org/r/359954

Change 358415 abandoned by Pmiazga:
Remove unused wgPopupsAPIUseRESTBase config variable

Reason:
Abandoned as change is already merged into config

https://gerrit.wikimedia.org/r/358415