Page MenuHomePhabricator

RFC: Abstract schemas and schema changes
Closed, ResolvedPublic

Description

At some point this should be a TechCom-RFC, but at the moment it's still in the drafting stage.

Problem

MediaWiki claims to support five databases: MySQL/MariaDB, SQLite, PostgreSQL ("PG"), Microsoft SQL Server ("MSSQL"), and Oracle Database. For normal runtime-type queries, we have abstractions that make these all mostly work pretty well.

But at the DDL level it's a completely different story. One major piece of (and source of) technical debt is the fact that MediaWiki does not have a database schema, it has four. And most schema changes have to be written five times, one for each supported database. In practice, this means schema changes for the less-supported databases are often omitted, or when not omitted are often merged without being tested.

We can improve the situation by abstracting the schema and schema change definitions, with code per database to translate that into the actual DDL statements.

Approved solution

Create a PHP interface (or base class) for schemas and schema changes. We implement this in top of Doctrine DBAL. Schemas and schema changes will be defined in JSON files conforming to https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema, which will be read into Schema or SchemaDiff objects. DBAL will then take care of generating SQL for any supported RDBMS.

In addition, database support (in the form of a subclass of the Database base class) should be made pluggable. Extensions that want to provide support for a database backend would provide a Database subclass as well as a suitable implementation of the schema and schema change interface. This would be trivial for any database that is supported by DBAL.

Notes:

  • This means we drop support for MSSQL and Oracle RDBMS from MediaWiki core, since DBAL support for them is insufficient and/or the schema for these databases has diverged from the main line schema. WMF will not continue support for these database backends. Volunteers have shown interest in bringing back support for these backends in form of extensions.
  • For schema definitions, we go with JSON for now. But the we may want to switch to YAML later, for easier editing. JSON and YAML can easily be converted into one another.
  • If someone wants to introduce a schema change, there should be a new deployable file which can contain several schema changes. Existing schema change (json) files should not be changed to perform additional changes.

Old proposals

Proposal #1

We should write a schema and schema change abstraction layer to integrate with MediaWiki's existing runtime database abstraction. Details are on-wiki at https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema and https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema/DB_Requirements, but in short:

  • We would have one schema, expressed as a structure in a JSON file. We would have one definition of each schema change, expressed as a structure in a JSON file.
  • Database-specific classes would exist to turn the schema or schema-change into SQL statements, much as we have database-specific subclasses of Wikimedia\Rdbms\Database.
  • We'd also tighten up some of the other database-level things: limited identifier lengths, index name uniqueness, data type consistency, charset consistency, etc.

The reason we didn't go with this:

  • It's lots of work to write a schema and schema change abstraction from scratch.

Proposal #2

Try to integrate Doctrine Migrations for schema creation and updates.

Pros (compared to Proposal #1):

  • We wouldn't have to implement all the database-specific logic ourself.
  • Probably a larger community fixing any bugs that exist.
  • Familiar system for (some subset of) PHP developers, simplifying onboarding of devs.

The reasons we didn't go with this:

  • We'd have to have code to translate MediaWiki's DB connection info to Doctrine's format, and otherwise translate between Doctrine conventions and MediaWiki conventions.
  • We may have to custom-implement a "mwtimestamp" type, or else standardize all DBs on using 14-byte strings.
  • We may still have to work around issues like MSSQL's different treatment of NULLs in unique indexes.

Related Objects

StatusSubtypeAssignedTask
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedNone
ResolvedReedy
ResolvedLadsgroup
ResolvedLadsgroup
OpenNone
OpenNone
ResolvedNone
ResolvedNone
ResolvedJdforrester-WMF
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedDaimona
ResolvedMarostegui
Resolved Bstorm
ResolvedDaimona
ResolvedUrbanecm
ResolvedMarostegui
Resolvedrook
DuplicateNone
ResolvedUmherirrender
InvalidNone
InvalidNone
ResolvedReedy
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
InvalidNone
ResolvedReedy
ResolvedReedy
InvalidNone
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedReedy
InvalidNone
DeclinedNone
ResolvedReedy
InvalidNone
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedPhysikerwelt
InvalidNone
ResolvedUmherirrender
ResolvedReedy
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedUmherirrender
ResolvedReedy
ResolvedUmherirrender
ResolvedUmherirrender
InvalidNone
ResolvedReedy
ResolvedUmherirrender
ResolvedSTran
ResolvedUmherirrender
ResolvedTchanders
ResolvedUmherirrender
InvalidNone
InvalidNone
InvalidNone
Resolvedawight
ResolvedReedy
InvalidNone
ResolvedLadsgroup
InvalidNone
ResolvedUmherirrender
ResolvedUmherirrender
OpenNone
ResolvedNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 634679 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Vary timestamp default value in abstract schema

https://gerrit.wikimedia.org/r/634679

Change 634748 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Expand DoctrineSchemaBuilderTest

https://gerrit.wikimedia.org/r/634748

Change 634679 merged by jenkins-bot:
[mediawiki/core@master] Vary timestamp default value per platform in abstract schema

https://gerrit.wikimedia.org/r/634679

Change 634748 merged by jenkins-bot:
[mediawiki/core@master] Expand DoctrineSchemaBuilderTest

https://gerrit.wikimedia.org/r/634748

Change 636235 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Fix ip_changes.ipc_rev_id column default value

https://gerrit.wikimedia.org/r/636235

Change 636235 merged by jenkins-bot:
[mediawiki/core@master] Fix ip_changes.ipc_rev_id column default value

https://gerrit.wikimedia.org/r/636235

Change 673718 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] Abstract schema: Handle MySQL Float/Double precision types

https://gerrit.wikimedia.org/r/673718

Change 673718 merged by jenkins-bot:
[mediawiki/core@master] Abstract schema: Handle MySQL Float/Double precision types

https://gerrit.wikimedia.org/r/673718

Change 673785 had a related patch set uploaded (by Ammarpad; owner: Ammarpad):
[mediawiki/core@master] DoctrineSchemaBuilder: Make 'table_options' top-level attribute

https://gerrit.wikimedia.org/r/673785

Change 673785 merged by jenkins-bot:
[mediawiki/core@master] DoctrineSchemaBuilder: Make 'table_options' top-level attribute

https://gerrit.wikimedia.org/r/673785

Change 676856 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/core@master] DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all

https://gerrit.wikimedia.org/r/676856

Change 676856 merged by jenkins-bot:

[mediawiki/core@master] DoctrineSchemaBuilder: Do not add prefix placeholder for Postgres at all

https://gerrit.wikimedia.org/r/676856

Suggestion, how about allowing "comment": [ "foo", "bar" ] as in extension.json/config/key/description? If nothing else it makes for potentially more pretty text due to line breaks.

Ladsgroup claimed this task.

The core schema is now fully abstract. We have a system for abstract schema changes that most recent schema changes in core use (among several extensions.). This is done. Of course more work can be done in every aspect (specially migrating extensions to abstract schema) but that should not be counted under implementation of this RFC.

Change 496421 abandoned by Ladsgroup:

[mediawiki/core@master] [POC] USe of Doctrine DBAL for oracle schema and schema changes

Reason:

Abstract schema has been long deployed \o/

https://gerrit.wikimedia.org/r/496421

Is there a maintenance script, procedure or advice to update an old database schema to the current one?

I have a wiki just updated to 1.39 but with many tables and fields still in latin1_swedish_ci that are starting to cause trouble. Should I go one-by-one manually updating the table schemas to the current ones?

Thanks for all the good work and for any help !

Is there a maintenance script, procedure or advice to update an old database schema to the current one?

update.php? ;)

What version are you upgrading from?

@Reedy wrote:
update.php? ;)
What version are you upgrading from?

Hah no, I wish it were that simple. I already run update.php when I upgraded from 1.35 to 1.39 and the wiki has been working ok for some weeks now. However, tracking down this bug with DiscussionTools, I narrowed it down to a database or data issue, and noticed that many of the tables and fields were latin1_swedish_ci rather than binary (both before and after the upgrade). Should this have been fixed by some update.php? Is there some other recommended way to fix it? Thanks again!

@Reedy wrote:
update.php? ;)
What version are you upgrading from?

Hah no, I wish it were that simple. I already run update.php when I upgraded from 1.35 to 1.39 and the wiki has been working ok for some weeks now. However, tracking down [https://www.mediawiki.org/wiki/Topic:X9zviki367qeczo5 this bug with DiscussionTools], I narrowed it down to a database or data issue, and noticed that many of the tables and fields were latin1_swedish_ci rather than binary (both before and after the upgrade). Should this have been fixed by some update.php? Is there some other recommended way to fix it? Thanks again!

AFAIK no, update.php won't have ever attempted to fix it. Because MW (in theory) has never "cared" which you actually use.

There's various (MW) support articles about changing it, but there's always gotcha's.

Probably depends on your $wgDBTableOptions settings? Doctrine can generate a schema update command given an actual and a desired schema, but that will probably result in character conversion which is not what you want...
In any case, not really related to this task.