Page MenuHomePhabricator

Flow as a Beta feature: enable, disable and reenable doesn't seem to work
Closed, ResolvedPublic

Description

Current example with full trace (since it's testwiki): https://test.wikipedia.org/wiki/User_talk:Test_2016-10-03_optin_5

Non-testwiki: https://www.wikidata.org/wiki/User_talk:BDavis_(WMF) .

From here

Dereckson

Lionel Scheepmans reports on fr.wikipedia to have enabled, then disabled Flow. The wikitext archived talk page was successfully restored. But then, they toggled on the beta preference again. Nothing happens, while the user expected to get back Flow.

Lionel Scheepmans

See also this page : https://fr.wikipedia.org/wiki/Discussion_utilisateur:Lionel_Scheepmans/Flow_Archive_1

And this one : https://fr.wikipedia.org/w/index.php?title=Sujet:T63k4mt1p090if46&topic_showPostId=t63pgjfh8ikwd7nd&fromnotif=1#flow-post-t63pgjfh8ikwd7nd

I've also remove {{Page de discussion wikitexte convertie en Flow|archive=Discussion utilisateur:Lionel Scheepmans/Flow Archive 1|{{Page de discussion wikitexte convertie en Flow|archive=Discussion utilisateur:Lionel Scheepmans/Flow Archive 1|date=2016-06-18}}date=2016-06-18}} from my talkpage after disabled Flow.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'll continue this tomorrow. But if other people have time to work on this, you may find P4150 helpful when debugging (that disables the Flow board while you debug live, before using it you have to create the account, create a wikitext page, and enable Flow)

Details below (the exceptions/DBs are from different rounds of testing, but it's 100% reproducible):

Basically, Flow thinks the page moved (Flow to archive), but core does not:

mysql:research@x1-analytics-slave [flowdb]> SELECT * FROM flow_workflow WHERE workflow_wiki = 'testwiki' AND workflow_title_text = 'Test_2016-10-03_optin_5/Flow_Archive_1';
+-------------+---------------+--------------------+------------------+----------------------------------------+---------------+--------------------------------+------------------+---------------------+------------------------+------------------+--------------------+---------------+
| workflow_id | workflow_wiki | workflow_namespace | workflow_page_id | workflow_title_text                    | workflow_name | workflow_last_update_timestamp | workflow_user_id | workflow_lock_state | workflow_definition_id | workflow_user_ip | workflow_user_wiki | workflow_type |
+-------------+---------------+--------------------+------------------+----------------------------------------+---------------+--------------------------------+------------------+---------------------+------------------------+------------------+--------------------+---------------+
| ^6c���@(V�     | testwiki      |                  3 |            92885 | Test_2016-10-03_optin_5/Flow_Archive_1 |               | 20161004032103                 |             NULL |                   0 | NULL                   | NULL             | NULL               | discussion    |
+-------------+---------------+--------------------+------------------+----------------------------------------+---------------+--------------------------------+------------------+---------------------+------------------------+------------------+--------------------+---------------+
1 row in set (0.00 sec)
mysql:research@s3-analytics-slave [testwiki]> SELECT * FROM page JOIN revision ON page_latest = rev_id WHERE page_title = 'Test_2016-10-03_optin_5';
+---------+----------------+-------------------------+-------------------+--------------+------------------+-------------+---------------+----------------+--------------------+-------------+----------+-----------------------+-------------------+-----------------+------------------+--------------------+-----------+--------+----------+-------------+-----------------------------------------------------------------+----------+------------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
| page_id | page_namespace | page_title              | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random   | page_touched   | page_links_updated | page_latest | page_len | page_no_title_convert | page_ext_reviewed | page_ext_stable | page_ext_quality | page_content_model | page_lang | rev_id | rev_page | rev_text_id | rev_comment                                                     | rev_user | rev_user_text          | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        | rev_content_model | rev_content_format |
+---------+----------------+-------------------------+-------------------+--------------+------------------+-------------+---------------+----------------+--------------------+-------------+----------+-----------------------+-------------------+-----------------+------------------+--------------------+-----------+--------+----------+-------------+-----------------------------------------------------------------+----------+------------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
|   92885 |              3 | Test_2016-10-03_optin_5 |                   |            0 |                0 |           1 | 0.15455462817 | 20161004025001 | 20161004031936     |      293407 |        1 |                     0 |              NULL |            NULL |             NULL | flow-board         | NULL      | 293407 |    92885 |      317669 | /* This page has been converted into a Flow discussion board */ |    26836 | Flow talk page manager | 20161004024951 |              0 |           0 |       1 |             0 | olohtzpss01qyls4ipio74tyvfmws7x | flow-board        | application/json   |
+---------+----------------+-------------------------+-------------------+--------------+------------------+-------------+---------------+----------------+--------------------+-------------+----------+-----------------------+-------------------+-----------------+------------------+--------------------+-----------+--------+----------+-------------+-----------------------------------------------------------------+----------+------------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
1 row in set (0.59 sec)

AFAICT, there are two leads I have so far:

  1. The transaction flow doesn't seem to be working as intended. The move is not committed after TitleMoveComplete (this is *probably* close to the root issue).
  2. Strange exception below.
exception 'MWException' with message 'Content of revision  (User_talk:Test_2016-10-03_optin_10/Archive_1) could not be loaded for validation!' in /srv/mediawiki/php-1.28.0-wmf.20/includes/Revision.php:1555
Stack trace:
#0 /srv/mediawiki/php-1.28.0-wmf.20/includes/Revision.php(1410): Revision->checkContentModel()
#1 /srv/mediawiki/php-1.28.0-wmf.20/includes/MovePage.php(522): Revision->insertOn()
#2 /srv/mediawiki/php-1.28.0-wmf.20/includes/MovePage.php(260): MovePage->moveToInternal()
#3 /srv/mediawiki/php-1.28.0-wmf.20/extensions/Flow/includes/Import/OptInController.php(142): MovePage->move()
#4 /srv/mediawiki/php-1.28.0-wmf.20/extensions/Flow/includes/Import/OptInController.php(113): Flow\Import\OptInController->movePage()
#5 /srv/mediawiki/php-1.28.0-wmf.20/extensions/Flow/includes/Import/OptInUpdate.php(50): Flow\Import\OptInController->disable()
#6 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(247): Flow\Import\OptInUpdate->doUpdate()
#7 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(209): DeferredUpdates::runUpdate()
#8 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(125): DeferredUpdates::execute()
#9 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(278): DeferredUpdates::doUpdates()
#10 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(91): DeferredUpdates::tryOpportunisticExecute()
#11 /srv/mediawiki/php-1.28.0-wmf.20/extensions/Flow/Hooks.php(1760): DeferredUpdates::addUpdate()
#12 /srv/mediawiki/php-1.28.0-wmf.20/includes/Hooks.php(195): FlowHooks::onUserSaveOptions()
#13 /srv/mediawiki/php-1.28.0-wmf.20/includes/user/User.php(5191): Hooks::run()
#14 /srv/mediawiki/php-1.28.0-wmf.20/includes/user/User.php(3891): User->saveOptions()
#15 /srv/mediawiki/php-1.28.0-wmf.20/includes/api/ApiOptions.php(126): User->saveSettings()
#16 /srv/mediawiki/php-1.28.0-wmf.20/includes/api/ApiMain.php(1427): ApiOptions->execute()
#17 /srv/mediawiki/php-1.28.0-wmf.20/includes/api/ApiMain.php(480): ApiMain->executeAction()
#18 (1): ApiMain->execute()

This exception is logged but not handled by OptInUpdate.

There is then a rollback (of core, I think).

exception 'DBTransactionError' with message 'Explicit transaction still active. A caller may have caught an error.' in /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/loadbalancer/LoadBalancer.php:1028
Stack trace:
#0 /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1357): Closure$LoadBalancer::approveMasterChanges()
#1 /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1048): LoadBalancer->forEachOpenMasterConnection()
#2 /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/lbfactory/LBFactory.php(230): LoadBalancer->approveMasterChanges()
#3 /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(407): Closure$LBFactory::forEachLBCallMethod()
#4 /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/lbfactory/LBFactory.php(233): LBFactoryMulti->forEachLB()
#5 /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/rdbms/lbfactory/LBFactory.php(303): LBFactory->forEachLBCallMethod()
#6 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(248): LBFactory->commitMasterChanges()
#7 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(209): DeferredUpdates::runUpdate()
#8 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(125): DeferredUpdates::execute()
#9 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(278): DeferredUpdates::doUpdates()
#10 /srv/mediawiki/php-1.28.0-wmf.20/includes/deferred/DeferredUpdates.php(91): DeferredUpdates::tryOpportunisticExecute()
#11 /srv/mediawiki/php-1.28.0-wmf.20/extensions/Flow/Hooks.php(1760): DeferredUpdates::addUpdate()
#12 /srv/mediawiki/php-1.28.0-wmf.20/includes/Hooks.php(195): FlowHooks::onUserSaveOptions()
#13 /srv/mediawiki/php-1.28.0-wmf.20/includes/user/User.php(5191): Hooks::run()
#14 /srv/mediawiki/php-1.28.0-wmf.20/includes/user/User.php(3891): User->saveOptions()
#15 /srv/mediawiki/php-1.28.0-wmf.20/includes/api/ApiOptions.php(126): User->saveSettings()
#16 /srv/mediawiki/php-1.28.0-wmf.20/includes/api/ApiMain.php(1427): ApiOptions->execute()
#17 /srv/mediawiki/php-1.28.0-wmf.20/includes/api/ApiMain.php(480): ApiMain->executeAction()
#18 (1): ApiMain->execute()
#19 (1): include()
#20 {main}

But Flow doesn't rollback since it commits earlier (when core *should* have committed), so it causes inconsistent state.

I'll be helping with this, since I am still able to create new cases in production.

In that comment you also raised the priority of this to UBN! Is this a blocker for wmf21 rolling forward this week?

In that comment you also raised the priority of this to UBN! Is this a blocker for wmf21 rolling forward this week?

No. This is not a regression from wmf20, and we are going to disable the Flow beta feature everywhere until we've fixed this problem.

Change 314042 had a related patch set uploaded (by Mattflaschen):
Flow opt in: Temporarily disable all, MW.org is redundant

https://gerrit.wikimedia.org/r/314042

Change 314042 merged by jenkins-bot:
Flow opt in: Temporarily disable all, MW.org is redundant

https://gerrit.wikimedia.org/r/314042

Change 314197 had a related patch set uploaded (by Mattflaschen):
WIP: Make more types of exceptions loggable

https://gerrit.wikimedia.org/r/314197

Still working on this, but AutoCommitUpdate looks promising.

Change 314197 merged by jenkins-bot:
Make more types of exceptions loggable

https://gerrit.wikimedia.org/r/314197

Change 314452 had a related patch set uploaded (by Catrope):
Make more types of exceptions loggable

https://gerrit.wikimedia.org/r/314452

Change 314452 merged by jenkins-bot:
Make more types of exceptions loggable

https://gerrit.wikimedia.org/r/314452

Mentioned in SAL (#wikimedia-operations) [2016-10-06T00:08:22Z] <dereckson@tin> Synchronized php-1.28.0-wmf.21/extensions/Flow/: Make more types of exceptions loggable ([[Gerrit:314452]], T135545, T138310) (duration: 01m 12s)

Change 314479 had a related patch set uploaded (by Mattflaschen):
Revision->insertOn: Set READ_LATEST flag

https://gerrit.wikimedia.org/r/314479

Change 314480 had a related patch set uploaded (by Mattflaschen):
Improve error-handling for OptInController

https://gerrit.wikimedia.org/r/314480

Okay, this works locally (but it already did...)

The main fix for production is the core one. Basically, this ensures that when two revisions are created in the same transaction, using a newly-created text_id, it will be fetchable inside the second revision. I think this alone should fix it.

Still working on this, but AutoCommitUpdate looks promising.

I refactored it to use this, but then had second thoughts. Although it shortens the transaction, and may make it more visible on UI inspection what failed, it's not a good experience for the user. If we can consistently either succeed 100% or rollback everything 100%, that's better.

On the other hand, I don't know if we want a transaction with multiple edits/moves.

I went ahead and changed it to MWCallableUpdate instead. Not sure if this is strictly necessary, but it has a feature "Abort if this DB is rolled back". Basically, this prevents the update from running if there is an earlier DB rollback. I did not confirm whether it would otherwise run in such a case.

So I'm open to second opinions on AutoCommitUpdate/MWCallableUpdate/neither.

However, the Flow commit also has better logging (which I definitely want to go in) and explicit DB rollback.

Change 314479 merged by jenkins-bot:
Revision->insertOn: Set READ_LATEST flag

https://gerrit.wikimedia.org/r/314479

Change 314626 had a related patch set uploaded (by Catrope):
Revision->insertOn: Set READ_LATEST flag

https://gerrit.wikimedia.org/r/314626

Change 314626 merged by jenkins-bot:
Revision->insertOn: Set READ_LATEST flag

https://gerrit.wikimedia.org/r/314626

Mentioned in SAL (#wikimedia-operations) [2016-10-06T23:37:54Z] <dereckson@tin> Synchronized php-1.28.0-wmf.21/includes/Revision.php: Revision->insertOn: Set READ_LATEST flag (T138310) (duration: 00m 49s)

My only lead at the moment is this exception:

2016-10-07 14:13:21 mediawiki-vagrant wiki: sb_debug: error in runUpdate: The Title object yields no ID. Perhaps the page doesn't
 exist? User talk:New user 27782803383838631792717581077141228306/Archive 1 #0 /vagrant/mediawiki/includes/content/AbstractConten
t.php(234): LinksUpdate->__construct()
#1 /vagrant/mediawiki/includes/page/WikiPage.php(2291): AbstractContent->getSecondaryDataUpdates()
#2 /vagrant/mediawiki/includes/page/WikiPage.php(1902): WikiPage->doEditUpdates()
#3 /vagrant/mediawiki/includes/libs/rdbms/database/Database.php(2645): Closure$WikiPage::doModify()
#4 /vagrant/mediawiki/includes/deferred/AtomicSectionUpdate.php(33): Database->doAtomicSection()
#5 /vagrant/mediawiki/includes/deferred/DeferredUpdates.php(250): AtomicSectionUpdate->doUpdate()
#6 /vagrant/mediawiki/includes/deferred/DeferredUpdates.php(222): DeferredUpdates::runUpdate()
#7 /vagrant/mediawiki/includes/deferred/DeferredUpdates.php(125): DeferredUpdates::execute()
#8 /vagrant/mediawiki/includes/MediaWiki.php(901): DeferredUpdates::doUpdates()
#9 /vagrant/mediawiki/includes/MediaWiki.php(730): MediaWiki->restInPeace()
#10 (): Closure$MediaWiki::doPostOutputShutdown()
#11 {main}

The page it references (User_talk:New_user_.../Archive_1) is the wikitext archived talk page that is supposed to have just been moved back to be the main user talk page.

If I change the call to movePage() to DO leave a redirect, this error does not occur and I can't reproduce the issue locally.

Catrope lowered the priority of this task from Unbreak Now! to High.Oct 7 2016, 9:18 PM

With the beta feature disabled and https://gerrit.wikimedia.org/r/314626 reducing the incidence of this bug (but not to zero yet), I'm taking this off UBN status for now.

Change 315527 had a related patch set uploaded (by Catrope):
Re-enable Flow beta feature on frwikiquote

https://gerrit.wikimedia.org/r/315527

Change 315527 merged by jenkins-bot:
Re-enable Flow beta feature on frwikiquote

https://gerrit.wikimedia.org/r/315527

Mentioned in SAL (#wikimedia-operations) [2016-10-12T18:17:51Z] <ebernhardson@mira> Synchronized wmf-config/InitialiseSettings.php: SWAT T138310 Re-enable Flow beta feature on frwikiquote (duration: 00m 50s)

It seems to be working on fr.wikiquote. :)

@SBisson are you still able to break it locally? If so, does it break permanently, or only temporarily if you reload it in the middle of the operation.

https://gerrit.wikimedia.org/r/#/c/314480/ still needs review.

I am creating a sub-task for fixing the existing pages.

@SBisson are you still able to break it locally? If so, does it break permanently, or only temporarily if you reload it in the middle of the operation.

I can still break it locally but not on test.wp or fr.wikiquote (where we re-enabled it this week). It'd be good if you could also test on those.

https://gerrit.wikimedia.org/r/#/c/314480/ still needs review.

I've looked at it last week. I'll add comment on the patch. I don't mind most of the changes but I'm not sure about the nested MWCallableUpdate.

I am creating a sub-task for fixing the existing pages.

Good. Is there a systematic way to find those?

Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF updated the task description. (Show Details)

Checked in wmf.23 testwiki: Enable Flow on user talk page->Disable-> Re-enable. The same sequence was done with updates added. Works fine, archived pages look fine - no issues.

Any schedule about re-enabling that Beta feature on wikis?

I'm drafting the Collaboration newsletter. What's new about that task? Can we announce something?

I'm drafting the Collaboration newsletter. What's new about that task? Can we announce something?

The error itself is fixed. I think we'll re-enable the feature after fixing the existing broken pages in production (T148057: Fix user talk pages already in inconsistent state due to to T138310).

Change 324764 had a related patch set uploaded (by Catrope):
Re-enable the Flow beta feature

https://gerrit.wikimedia.org/r/324764

Checked on testwiki with 1.29.0-wmf.4 - enable/disable/re-enable works for new users and for existing users. Also, checked the availability of archived pages, archiving empty Flow boards, moving archived Flow page and enabling 'Flow on user talk page' again - a new Flow board will be created in this case.

My comments from T150073: Preferences 'Restore all default settings' option does not apply to 'Flow on user talk' Beta feature
Re-checked with 1.29.0-wmf.4 - no errors in the case

when a user attempts to disable Flow by unchecking the option, the User talk page displays the following error (found at testwiki).

  • 'Restore all default settings' still does not apply to 'Flow on user talk page' option; users still see their User talk page in Flow format.
  • The workaround for a user that is stuck with the user talk page in Flow format is to check 'Flow on user talk' ->Save -> uncheck 'Flow on user talk'->Save. The previously archived user talk wikitext page will be displayed.
  • another thing to check after the fix - if during the workaround described above, after check 'Flow on user talk' ->Save, a user will go to his User talk page, he will see again the intro.

Screen Shot 2016-12-01 at 11.13.08 AM.png (452×760 px, 73 KB)

I scheduled this for deployment at 4pm PST today, and then totally forgot about it, so I wasn't there to supervise at 4pm and my patches were skipped. ARGH.

I've rescheduled for 11am PST (19:00 UTC) on Monday Dec 5.

Change 324764 merged by jenkins-bot:
Re-enable the Flow beta feature

https://gerrit.wikimedia.org/r/324764

Mentioned in SAL (#wikimedia-operations) [2016-12-05T19:10:52Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:324764|Re-enable the Flow beta feature]] T138310 (duration: 00m 45s)

The Flow beta feature has now been re-enabled on all wikis.

Checked in frwiki, kabwiki, mediawiki - for enabling/disabling/re-enabling for existing/new users - all works.

Change 314480 merged by jenkins-bot:
[mediawiki/extensions/Flow@master] Improve error-handling for OptInController

https://gerrit.wikimedia.org/r/314480

Change 405743 had a related patch set uploaded (by Sbisson; owner: Mattflaschen):
[mediawiki/extensions/Flow@wmf/1.31.0-wmf.17] Improve error-handling for OptInController

https://gerrit.wikimedia.org/r/405743

Change 405743 merged by jenkins-bot:
[mediawiki/extensions/Flow@wmf/1.31.0-wmf.17] Improve error-handling for OptInController

https://gerrit.wikimedia.org/r/405743