At some point this should be a TechCom-RFC, but at the moment it's still in the drafting stage.
Problem
MediaWiki claims to support five databases: MySQL/MariaDB, SQLite, PostgreSQL ("PG"), Microsoft SQL Server ("MSSQL"), and Oracle Database. For normal runtime-type queries, we have abstractions that make these all mostly work pretty well.
But at the DDL level it's a completely different story. One major piece of (and source of) technical debt is the fact that MediaWiki does not have a database schema, it has four. And most schema changes have to be written five times, one for each supported database. In practice, this means schema changes for the less-supported databases are often omitted, or when not omitted are often merged without being tested.
We can improve the situation by abstracting the schema and schema change definitions, with code per database to translate that into the actual DDL statements.
Approved solution
Create a PHP interface (or base class) for schemas and schema changes. We implement this in top of Doctrine DBAL. Schemas and schema changes will be defined in JSON files conforming to https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema, which will be read into Schema or SchemaDiff objects. DBAL will then take care of generating SQL for any supported RDBMS.
In addition, database support (in the form of a subclass of the Database base class) should be made pluggable. Extensions that want to provide support for a database backend would provide a Database subclass as well as a suitable implementation of the schema and schema change interface. This would be trivial for any database that is supported by DBAL.
Notes:
- This means we drop support for MSSQL and Oracle RDBMS from MediaWiki core, since DBAL support for them is insufficient and/or the schema for these databases has diverged from the main line schema. WMF will not continue support for these database backends. Volunteers have shown interest in bringing back support for these backends in form of extensions.
- For schema definitions, we go with JSON for now. But the we may want to switch to YAML later, for easier editing. JSON and YAML can easily be converted into one another.
- If someone wants to introduce a schema change, there should be a new deployable file which can contain several schema changes. Existing schema change (json) files should not be changed to perform additional changes.
Old proposals
Proposal #1
We should write a schema and schema change abstraction layer to integrate with MediaWiki's existing runtime database abstraction. Details are on-wiki at https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema and https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema/DB_Requirements, but in short:
- We would have one schema, expressed as a structure in a JSON file. We would have one definition of each schema change, expressed as a structure in a JSON file.
- Database-specific classes would exist to turn the schema or schema-change into SQL statements, much as we have database-specific subclasses of Wikimedia\Rdbms\Database.
- We'd also tighten up some of the other database-level things: limited identifier lengths, index name uniqueness, data type consistency, charset consistency, etc.
The reason we didn't go with this:
- It's lots of work to write a schema and schema change abstraction from scratch.
Proposal #2
Try to integrate Doctrine Migrations for schema creation and updates.
Pros (compared to Proposal #1):
- We wouldn't have to implement all the database-specific logic ourself.
- Probably a larger community fixing any bugs that exist.
- Familiar system for (some subset of) PHP developers, simplifying onboarding of devs.
The reasons we didn't go with this:
- We'd have to have code to translate MediaWiki's DB connection info to Doctrine's format, and otherwise translate between Doctrine conventions and MediaWiki conventions.
- We may have to custom-implement a "mwtimestamp" type, or else standardize all DBs on using 14-byte strings.
- We may still have to work around issues like MSSQL's different treatment of NULLs in unique indexes.