Page MenuHomePhabricator

Adjust code reading from wb_terms table to use full entity ID
Closed, ResolvedPublic

Description

term_full_entity_id has been added to wb_terms table to store the full entity ID serialization.
With https://gerrit.wikimedia.org/r/#/c/346169 making the code write to both "old" numeric id column and the "new" full ID column, it becomes possible to also use the new column for read operations.

As "full id" values for existing entries in wb_terms will be populated using maintenance script, initially database table would be in a transition state with some records having values in both entity ID column, and some having only the "old" numeric id value. Reading code should be able to handle such cases.
One way to handle this, as said in https://phabricator.wikimedia.org/T159851#3140376, would be to introduce three-state feature flag: 1) read only from numeric id column, 2) read only from the full id column, 3) read from the new column, if it does not exist or the value is null, read the numeric column.
As discussed with @daniel today, the alternative way would be to not add a a feature flag and instead always read all columns of wb_terms, and use the "full id" value if it is there, and fall back to the "numeric id" column if it there is no "full id" value or it is null.
As we talked with @daniel it might that in case of this schema change the latter approach might be good enough. It might make it easier without needing to deal with the extra burden introducing and then removing the feature flag brings.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenFeatureNone
DuplicateNone
ResolvedNone
ResolvedNone
ResolvedNone
DuplicateNone
InvalidLydia_Pintscher
OpenNone
OpenNone
StalledNone
OpenNone
ResolvedAddshore
Resolvedthiemowmde
ResolvedAddshore
DeclinedNone
OpenNone
Resolvedhoo
ResolvedLydia_Pintscher
ResolvedNone
DeclinedNone
InvalidLydia_Pintscher
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
DeclinedNone
ResolvedLadsgroup

Event Timeline

To be clear: this is something which is NOT required to be done before applying the schema change to the database during the data center switch over. It could be implemented later.

I checked the codebase and it seems it's not easily doable to do it without a feature flag. The code and queries will become complex and messy and I'm not sure if we want that. There are three places that depend on 'term_entity_id': TermSqlIndex, SqlEntitiesWithoutTermFinder, SqlEntityInfoBuilder some use it in WHERE (for delete/update/select) and some use in SELECT fields and process it later in other places (those parts need to handle it both cases too).

@Ladsgroup Is it going to be less messy with the feature flag? You still need conditionals in the same places, plus you have to inject the flag...

I mean it will be messy without a flag.

I mean it will be messy without a flag.

Yea... but I think it will be just as messy with a flag. Maybe even worse.

@Ladsgroup so, I think both options are nasty in their own way :) Since you have to write it, I'll leave it to you to decide which kind of mess you prefer!

I'd say we go with feature flag. My reason is that we need to have this flag temporarily and we remove it later (after one or two mediawiki release I'd say). In that case removing the parts related to the old method would be much easier.

Change 351641 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] [WIP] Two wb_terms reading modes

https://gerrit.wikimedia.org/r/351641

Change 352115 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Rename "hasFullEntityIdColumn" to "writeFullEntityIdColumn"

https://gerrit.wikimedia.org/r/352115

Change 352131 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] [WIP] Use two reading mode in SqlEntitiesWithoutTermFinder

https://gerrit.wikimedia.org/r/352131

Change 352157 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] [WIP] Two reading modes of wb_terms in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/352157

Change 352115 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Rename "hasFullEntityIdColumn" to "writeFullEntityIdColumn"

https://gerrit.wikimedia.org/r/352115

Change 352131 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Use two reading modes in SqlEntitiesWithoutTermFinder

https://gerrit.wikimedia.org/r/352131

Change 352157 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Two reading modes of wb_terms in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/352157

Change 351641 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Two wb_terms reading modes in TermSqlIndex

https://gerrit.wikimedia.org/r/351641

Change 354093 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Test TermSqlIndex::deleteTermsOfEntity in the read full ID mode

https://gerrit.wikimedia.org/r/354093

Change 354575 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Read config for two wb_terms reading modes

https://gerrit.wikimedia.org/r/354575

Change 354093 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Test TermSqlIndex::deleteTermsOfEntity in the read full ID mode

https://gerrit.wikimedia.org/r/354093

Probably this can be closed now.

Yes, you are right. I thought this one got merged.

Change 354575 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Read config for two wb_terms reading modes

https://gerrit.wikimedia.org/r/354575

WMDE-leszek moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.

All code should be there. Now we have to think about how to deploy this.