This needs to be scheduled, widely announced, and then the switch flipped in between dump runs. I'd like to see it happen for the Feb 1 run which gives us a little over two months to get the word out to folks and let them update their dump processing scripts.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T238972 switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots | |||
Resolved | daniel | T238959 Make TextPassDumperTest work with 0.11 dump schema | |||
Open | None | T239650 Parsing xml dumps will need to be corrected after T238972 | |||
Resolved | daniel | T240213 Write integration tests for XML dumps with multiple MCR slots per revision | |||
Resolved | daniel | T246074 Improve performance when writing multi-content revisions to XML dumps |
Event Timeline
I'm going to send an email announcement to wikitech and xmldatadumps-l. Someone on the research and wikidata lists should forward the announcement there. Adding the relevant projects (sorry if they aren't right, please feel free to move this around where it belongs).
https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists? @hoo are you on the wikidata mailing list and can you forward it there? Thanks in advance :)
Forwarded to wikidata-tech for now, not sure if it should also be on wikidata-l proper.
Example: a wikitext-only revision might change from
<revision> <!-- ... --> <model>wikitext</model> <format>text/x-wiki</format> <text bytes="16" xml:space="preserve">Wikitext content</text> <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1> </revision>
to
<revision> <!-- ... --> <origin>2748</origin> <model>wikitext</model> <format>text/x-wiki</format> <text bytes="16" sha1="basgq6oyo0kf51ykrohsumsutvpda86" xml:space="preserve">Wikitext content</text> <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1> </revision>
– almost the same, but there is now a sha1 attribute on the <text> tag and the <origin> is new.
Example: a WikibaseMediaInfo revision might change from
<revision> <!-- ... --> <model>wikitext</model> <format>text/x-wiki</format> <text bytes="0" xml:space="preserve" /> <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1> </revision>
to
<revision> <!-- ... --> <origin>2224</origin> <model>wikitext</model> <format>text/x-wiki</format> <text bytes="0" sha1="phoiac9h4m842xq45sp7s6u21eteeq1" xml:space="preserve" /> <content> <role>mediainfo</role> <origin>2590</origin> <model>wikibase-mediainfo</model> <format>application/json</format> <text bytes="371" sha1="oropqlvv0q2n9spse1s6autcvay4vqz" xml:space="preserve">{"type":"mediainfo","id":"M902","labels":[],"descriptions":[],"statements":{"P25":[{"mainsnak":{"snaktype":"value","property":"P25","hash":"183074b9158e8b72cc95b7f6c16d5ba5ab5d9544","datavalue":{"value":{"entity-type":"item","numeric-id":503,"id":"Q503"},"type":"wikibase-entityid"}},"type":"statement","id":"M902$d8b8679f-4a1e-bfc4-499b-67ee67f9e155","rank":"normal"}]}}</text> </content> <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1> </revision>
– the <text> is still empty (it’s a file), but the entity content is new.
This is pending https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/556346/ and related patches, so we're looking at March 1 if all goes well.
Removing task assignee due to inactivity, as this open task has been assigned for more than two years. See the email sent to the task assignee on February 06th 2022 (and T295729).
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.