As a consumer of AB test results I want accurate results so i can make sound decisions
Looking at data from the commonswiki mediasearch ab test, from 2019-09-10T16:00 through 17:00, there are thirteen events where the frontend logged one bucket, but the backend logging recorded a different bucket. Not sure if we've looked at this specifically before, joining frontend and backend logs and comparing recorded buckets. If frontend and backend don't agree on test buckets the data will be less reliable, and it will generally cause the stats to tend towards the same values in separate buckets.
Example bad request:
- search_id: 163dliqu8lj2hpsgn9cvrbdwo
- mediawiki_cirrussearch_request logged http params: cirrusUserTesting=control
- event.SearchSatisfaction logged subTest: mediasearch_commons_int
This ticket is for the investigation and to create new tickets for the solution.