We currently have an independent schema for collecting information about usage of the did you mean feature in search. This should be integrated into the search satisfaction data collection to get all the data in one place.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Integrate did you mean collection into search satisfaction | mediawiki/extensions/WikimediaEvents | master | +160 -149 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | EBernhardson | T138087 Integrate "did you mean" data collection into search satisfaction schema | |||
Resolved | mpopov | T144424 Add a PaulScore approximation to discovery.wmflabs.org |
Event Timeline
The main idea here is to record:
- Was the query auto-magically rewritten in the backend due to zero results on the initial query
- Record when the user clicks did you mean. Currently we only record that two queries were performed, but not that the second query was a query decided by the did you mean
- Might also be nice to record if a search result page has a did you mean suggestion shown to the user.
The main idea here is to record:
- Was the query auto-magically rewritten in the backend due to zero results on the initial query
- Record when the user clicks did you mean. Currently we only record that two queries were performed, but not that the second query was a query decided by the did you mean
- Might also be nice to record if a search result page has a did you mean suggestion shown to the user.
@mpopov wondering if you have any preferences, after looking this over i'm thinking of the following:
I suppose first we have to think about what it is we are measuring. I think the overall goal is to be able to determine how satisfied a user is with their suggested search query and be able to more directly measure improvements to the suggestions we provide. One example could be that we currently only use the query rewrite capability when the query has 0 results. We did this because "some results is better than none", but didn't have a way to measure if performing this style of rewrite on queries with few results would be any better.
Possible ways a user can interact with did you mean:
- Searches for something that provides results, but also has a did you mean suggestion[1]
- Searches for something that has no results, and we internally rewrite into a did you mean suggestion that has results[2]
- Searches for something that has no results, and we internally rewrite into a did you mean suggestion that has results[2], then the user clicks the Search instead for '<original query>' which currently always returns 0 results (but could in the future do something different if we change the requirements for when/how we rewrite the query, for example perhaps instead of a total rewrite we merge the results internally with (original) OR (rewritten^0.5) or some such on queries that have few results instead of only no results)
- Searches for something that has no results, we internally rewrite into a did you mean suggestion and that has no results[3]
- Searches for something that has no results, we internally rewrite into a did you mean suggestion that has results[2], user clicks the Showing result for 'xyz' link[4] and is presented with search results plus a new suggestion[5]
[1] https://en.wikipedia.org/wiki/?search=tayps&fulltext=1
[2] https://en.wikipedia.org/wiki/?search=weldng+defacts&fulltext=1
[3] https://en.wikipedia.org/wiki/?search=tayps+of+wlding+difacts&fulltext=1
[4] https://en.wikipedia.org/wiki/?search=weldng+defacts&fulltext=1
[5] https://en.wikipedia.org/wiki/?search=wedding+defects&fulltext=1
Based on this list of possible ways a user can interact with the feature, I'm thinking we handle the following events:
- When a user clicks the did you mean suggestion
- Log a click event, same code path as currently used. the position field will be null, and the inputLocation field, which previously was only used for autocomplete, will contain didyoumean
- When a user arrives at a search result from a did you mean click
- Log a searchResultPage event, same code path as currently used. This feels a bit odd, but it seems to make sense to use the inputLocation field again with the didyoumean value.
- When a user arrives at a search result with an internally rewritten did you mean, basically when the original query had no results and we instead presented the user with results to the rewritten query
- Log a searchResultPage event, same code path as currently used. This also feels a bit odd, but does it make sense to use the inputLocation field yet again with didyoumean-internal value?
- I'm still thinking about how to handle the case of the user clicking links to the original or rewritten query on the result page that was already rewritten, but will likely follow something along the lines above.
Alternatively can not re-use the inputLocation field and add some new field. The inputLocation does in some ways feel like it captures the intent here though, although perhaps only partially.
gotten most of the way there, have tests for several of the cases and they are passing. I realized i likely still need to add a field to the schema though, as I have no way to indicate what kind (incl not at all) of did you mean is being shown on the search result page
Change 311654 had a related patch set uploaded (by EBernhardson):
Integrate did you mean collection into search satisfaction
Change 311654 merged by jenkins-bot:
Integrate did you mean collection into search satisfaction