Page MenuHomePhabricator

Measure how often users click on more languages links
Closed, ResolvedPublic8 Estimated Story Points

Description

Motivation
Users should be able to access terms in all entered languages, but there are too many to always show them all. Therefore, they are categorized in "more languages" (by default expanded) and "all entered languages" (by default collapsed). To be able to know more about how users interact with the language terms, we want to track how often these links are being clicked.

Sample screenshot with the two links

Bildschirmfoto 2018-12-03 um 15.25.56.png (644×1 px, 128 KB)

Acceptance Criteria

  • Please get the percentage and total number, in how many page visits the "more languages" link was clicked at least once to collapse the languages. (So per page we only count "yes" or "no").
  • Please get the percentage and total number, in how many page visits the "more languages" link was clicked at least once to extend the languages again. (So per page we only count "yes" or "no").
  • Please get the percentage and total number, if "all entered languages" was clicked to expand the list. (We are not interested if the link was clicked multiple times or not)

Related ticket
T125404: [Wikidata] Tracking UI interaction

Tech Notes

This can be done with mw.track in Javascript.
Example of mw.track in use can be found with event logging in a WMDE extension here: https://github.com/wikimedia/mediawiki-extensions-AdvancedSearch/blob/4d24d5f7da5bc7ab7f686bfb0c89e2eb65e73b70/modules/ext.advancedSearch.init.js#L40

We probably only want to create 1 event called "TermBoxInteraction" or something like that, with one of the fields being the type of interaction. expand, collapse, all for example.
The JS will just have to remember if it has already counted the click for each of the 3 cases so that a single request will only ever log 1 click of each link type per page visit.

Page visit = page visit of the non-crawler user
Page visit data for entities can be taken from the page view tables in hadoop and regenerated afterward.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Note there are three Acceptance Criteria, but for whatever reason Phabricator does not show the last one as a list item.

xSavitar subscribed.

Note there are three Acceptance Criteria, but for whatever reason Phabricator does not show the last one as a list item.

Fixed now! :)

It is (arguably, but I am convinced) not related to the development of the new feature.
If the point is this data should have probably been better collected earlier, this is of course true.

Sorry but if the data is needed for the hike then this is hike work unless there is a good reason.

Addshore set the point value for this task to 8.

Change 478224 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/Wikibase@master] Measure how often users click on more languages links

https://gerrit.wikimedia.org/r/478224

I created the patch counting the clicks as described above. But I think, I need a more detailed introduction/tutorial or some pairing for working with graphite and selecting the correct metrics from Hadoop as stated in the description.

This is somewhat stalled until I'm (auto-)confirmed on meta.wikimedia.org. However, @Lydia_Pintscher already created the page for the new schema: https://meta.wikimedia.org/wiki/Schema:WikibaseTermboxInteraction and I will add it as soon as I'm allowed to do so. But if @Addshore wants to take this over before then, that would be fine as well :)

Michael changed the task status from Open to Stalled.Dec 12 2018, 1:58 PM
Michael changed the task status from Stalled to Open.Dec 19 2018, 10:59 AM

So, I created an https://meta.wikimedia.org/wiki/Schema:WikibaseTermboxInteraction, but I'm not sure what other data we should collect than the action itself?

So, that might be enough data, it will be combined with the data seen at https://meta.wikimedia.org/wiki/Schema:EventCapsule, which will result in the final schema.

I made a small correction to the description of the schema changing wikidata -> wikibase.

We probably also want to specify more details about the one field we are tracking.
It is important for people using the data to know that each variation of the event will only be tracked once in a single page load / JS load.

The only other thing that I can think of wanting is some way to tie multiple events from a single request together. This could be done with a random number generated by the JS and sent which these events if multiple events are triggered.
However the task described in this task description doesn't need that so lets not bother.

Change 478224 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Measure how often users click on more languages links

https://gerrit.wikimedia.org/r/478224

Can I please see some resulting data?

Can I please see some resulting data?

This still hasn't been deployed yet.
Will be deployed this week.

Query getting data from hadoop for January 2019

SELECT
e.month AS month, e.day AS day,
hideCount, showCount, allCount, termboxEntityViews,
( hideCount / termboxEntityViews * 100 ) as percHideCount,
( showCount / termboxEntityViews * 100 ) as percShowCount,
( allCount / termboxEntityViews * 100 ) as percAllCount
FROM (

SELECT e.month AS month, e.day AS day,
COUNT(case e.event.actionType when 'hide' then 1 else null end) as hideCount,
COUNT(case e.event.actionType when 'show' then 1 else null end) as showCount,
COUNT(case e.event.actionType when 'all' then 1 else null end) as allCount
FROM event.wikibasetermboxinteraction e
WHERE e.year = 2019 AND e.month = 1
AND e.wiki = 'wikidatawiki'
GROUP BY e.month, e.day
ORDER BY e.month, e.day
LIMIT 100000

) e

LEFT OUTER JOIN (

SELECT p.month AS month, p.day AS day,
SUM(p.view_count) as termboxEntityViews
FROM wmf.pageview_hourly p
WHERE p.year = 2019 AND p.month = 1
AND p.project = 'wikidata'
AND p.access_method = 'desktop'
AND p.agent_type = 'user'
AND ( p.namespace_id = 0 OR p.namespace_id = 120 )
GROUP BY p.month, p.day
ORDER BY p.month, p.day
LIMIT 100000

) p
ON e.month = p.month AND e.day = p.day

And said data:

month	day	hidecount	showcount	allcount	termboxentityviews	perchidecount	percshowcount	percallcount		
1	9	41	59	135	309963	0.01322738520404048	0.01903452992776557	0.04355358542793818	
1	10	342	440	734	304415	0.11234663206477997	0.14453952663305028	0.24111821033786113	
1	11	328	413	747	385801	0.08501792374825363	0.10705000764642911	0.19362313731690692	
1	12	247	319	858	421485	0.05860232273983653	0.0756847811903152	0.20356596320153741	
1	13	305	391	837	389771	0.0782510756315888	0.10031531335065974	0.2147414764053765	
1	14	354	466	972	407247	0.0869251338868058	0.1144268711617274	0.23867579135021252	
1	15	334	451	776	387644	0.08616152965091682	0.11634386189390265	0.2001836736799744	
1	16	382	508	797	304297	0.1255352501010526	0.16694216505585005	0.26191516840455215

Data explanation:

  • termboxentityviews = number of item or property page views by users on desktop on the given day on wikidata.org
  • hidecount = number of those pageviews that had an interaction with "more languages" to collapse the box
  • showcount = number of those pageviews that had an interaction with "more languages" to expand the box
  • allcount = number of those pageviews that had an interaction with "all entered languages" to expand the box to all languages
  • perc*count = percentage of those pageviews that the given interaction occurred on

I'm surprised to see that the show count numbers are higher than the hidecount numbers, since by default users would only be able to see the "hide link". The only explanation I can come up with, is that most of the clicks originate from people who hide the termbox by default and then need to "unhide" it again, and that this number of people is higher than people who feel an urge to hide the termbox.
Or can you come up with any other explanation?

I'm surprised to see that the show count numbers are higher than the hidecount numbers, since by default users would only be able to see the "hide link".

I went back and checked the code and the tracking definitely looks correct (the right way around)

The only explanation I can come up with, is that most of the clicks originate from people who hide the termbox by default and then need to "unhide" it again, and that this number of people is higher than people who feel an urge to hide the termbox.

That indeed seems correct
How does one actually make the choice of open vs closed persist?

Also, a note on the data, the interaction tracking will not include people that ask not to be tracked, however the page view data would include these page views.
So the %s are probably ever so slightly lower than reality.

How does one actually make the choice of open vs closed persist?

Nothing to do. It remembers the last state in a cookie or your settings.

I am closing this now but I'd still like to see which decision this has helped us make.