Page MenuHomePhabricator

WMDE Banner Campaigns Comprehensive Report 2017-2020
Closed, ResolvedPublic

Description

New editors - WMDE deep dive - analytics questions - phase 1
7th July 2020

Reference Document: New editors - WMDE deep dive - analytics questions

Background/ Reason why

The first campaign which contained tracking was conducted in 2017. Since then several campaigns with different content and user journeys have been realized. For strategic decisions on future activities to gain new editors we want to comprehensively analyze past activities and their impact. Besides qualitative results we need to consider quantitative results. As campaign reports usually cover a certain time during and after the campaign but no long term effects. This should be done in this comprehensive analysis.

Timing
Briefing: 9th July 2020
Delivery of report: week 31

General requirements
The report should be delivered in tables and charts in html as in previous reports
The report might be publicly available in the future. For this delivery deadline this is no requirement.
Communication will happen in phabricator
Based on this report we will have more questions which should be addressed in a phase 2 in August.
The time span for this report is January 2017 until June 2020

Analytics areas
Organic development of editor numbers
What is the age of the German Wikipedia Community in terms of account age?
since January 2017: How many actively registered in de-WP? - NOTE: mind the user_self_made field in wmf.mediawiki_history
How many of them edited (since registration until 30th June 2020):

  • 1 edit
  • 2 to 5 edits
  • 5 to 9 edits
  • 10 to 49 edits
  • 50 or more edits

retention rate: How many newly registered users are active after (active = at least 1 edit)

  • 2 weeks after registration
  • 1 month after registration
  • 6 months after registration
  • 12 months after registration

How high is the retention rate of these active users compared to the number of registrations?
Edit Classes (facets) x Account Age Classes (group, step: one year) x Time (horizontal) → do we observe always one and the same group of active editors, or do the newcomers join in to stay active editors? - start: 2017.

Direct campaigning effects
All previous campaigns should be covered. For the following we need the sum and the individual numbers per campaign
since January 2017: How many actively registered in de-WP?
How many of them edited (since registration until 30th June 2020):

  • 1 edit or more
  • 2 to 5 edits
  • 5 to 9 edits
  • 10 to 49 edits
  • 50 or more edits

retention rate: How many newly registered users are active after (active = at least 1 edit)

  • 2 weeks after registration
  • 1 month after registration
  • 6 months after registration
  • 12 months after registration

How high is the retention rate of these active users compared to the number of registrations?
Of the people who started training modules (onboarding content used in 2018 in thank you, spring and summer campaign), how is the rate of still active users?

Event Timeline

GoranSMilovanovic renamed this task from Comprehensive Report 2017-2020 to WMDE Banner Campaigns Comprehensive Report 2017-2020.Jun 25 2020, 10:33 PM
GoranSMilovanovic claimed this task.
GoranSMilovanovic added a subscriber: Tobi_WMDE_SW.

@Verena

  • All datasets produced via PySpark;
  • Analyzing and visualizing the data now in R;
  • Reporting in R Markdown as soon as possible.

@Verena

  • All registration and revision datasets analyzed;
  • adding the campaign registered users facet now.

@Verena

  • All datasets complete;
  • reporting & visualization now.

@Verena @Tobi_WMDE_SW

Here is the Report:

Note 1. This Report focuses on what has happened since 2017 and until now, as described in the ticket. However, I have produced the aggregated datasets for user registrations and revisions since the onset of dewiki (more precisely: from everything that we have on dewiki in wmf.mediawiki_history), just in case.

Note 2. As you will observe, 49 WMDE campaign registered users - or 1.18% of their total number - were not matched with any data in the wmf.mediawiki_history table. There is nothing I can do about it. From my side, I have checked for the uniqueness of the user IDs and confirmed that they are indeed unique.

Note 3. The datasets used to produce this report are pretty huge (given our resources); I had to switch from using one heavily used server (stat1005) to another (stat1007) just to be able to pre-process everything. Any request to expand the scope of the datasets used to produce this Report (for example, focusing on all user revisions instead of analyzing only revisions on content pages as it is done here) will necessarily imply re-writing the code in PySpark and executing on the Analytics Cluster - and that will take a lot of time (to code, not to execute). The existing analytics code is in R, runs completely in RAM on our analytics server(s), and was developed as such for reasons of consistency (namely, all our previous New Editors analyses were developed in that way). Only the ETL here is done in PySpark.

Note 4. @Verena As of the Training Modules data - I still need to see what can be re-matched with the user registration datasets, and - more important - if anything like that can be done in a consistent way (e.g. for all the existing campaigns that have used Training Modules).

GoranSMilovanovic lowered the priority of this task from High to Medium.Jul 17 2020, 8:12 PM

Hi Goran,

understood and read the report and your notes. We won't extend the scope then and see what results you can get for the Training Modules.

I have just one major remark on the report: I also need registrations and revisions per campaign to be able to compare the campaigns. Are you still on it or did you miss it out? (the edit class and account age comparison is not necessary for the campaign split).

@Verena

I also need registrations and revisions per campaign to be able to compare the campaigns.

Ok, on it.

@Verena See section 1.3 Campaign Registrations and Revisions: Overview for registrations and revisions per campaign.

@Verena

Of the people who started training modules (onboarding content used in 2018 in thank you, spring and summer campaign), how is the rate of still active users?

Please see Section 1.4 Training Modules in 2018 Campaigns: Active Users.

Note. I had to search manually through all campaign use registrations + training modules data for this. That taken into account, I cannot guarantee that the result is anything more than approximate, but I also did not observe any inconsistencies prima facie in the 2018 data.

@Verena

I also need registrations and revisions per campaign to be able to compare the campaigns.

Ok, on it.

@GoranSMilovanovic

Hi Goran, when going through your campaign split in the latest update of the report, I wondered if it were much additional work to compute not only # of revisions and registrations (in section 1.3) , but also retention/ retention rates (like you did in section 1.2.2) per campaign? I guess that's what @Verena was initially asking for.

@ChrisPins As @Verena put it in T256433#6328618

I have just one major remark on the report: I also need registrations and revisions per campaign to be able to compare the campaigns. Are you still on it or did you miss it out? (the edit class and account age comparison is not necessary for the campaign split).

so I don't see how you "... guess that's what @Verena was initially asking for." in T256433#6337701, but yes it can be done :)

In other words, what you need is a tabulation of active users (retention classes: two weeks, one month, six months, one year) per campaign; to be delivered in the next Report update.

@GoranSMilovanovic
Yes, excactly, it would be really great to have it! I spoke with @Verena when setting up the analytic briefing and we were especially interested in the retention numbers/ rates per campaign. Probably this was not perfectly clear because Verena referred to the headline in the first part of the report (1.2 --> "Campaign Registrations and Revisions...") which included retention data. However, thanks a lot!

@ChrisPins

The answer to your question is now found under 1.3.2 Campaign User Retention in the following update of the Report:

We have 2 more requests:

  1. Organic Growth/ Age of Community: For the years 2001 to 2019 we need for every year the average age of all accounts who did at least one edit in this year. age = number of years since registration ( I am aware that for a few accounts the registration date can't be retrieved from the database. Because this should be a relatively small number of accounts we can neglect that here.)
  2. For 1.3 Campaign Registrations and Revisions: Overview we need one additional column: number of registered users who did at least one edit as of 30th June 2020

Please let me know if it is feasible to get this data within the next two weeks or if the required time for executing this is too much for your available time.

Note: We will have further analytics requests within this project which I can specify on the 28th August after review with our ED. This again will have to be delivered within 2 weeks. We will take care to limit it to a few additional requests.

@Verena

Please let me know if it is feasible to get this data within the next two weeks or if the required time for executing this is too much for your available time.

It should be fine.

@Verena

Organic Growth/ Age of Community: For the years 2001 to 2019 we need for every year the average age of all accounts who did at least one edit in this year.

Please see section 2.1 Organic Growth/Age of Community.

For 1.3 Campaign Registrations and Revisions: Overview we need one additional column: number of registered users who did at least one edit as of 30th June 2020.

Please see 1.3 Campaign Registrations and Revisions: Overview.

Note: We will have further analytics requests within this project which I can specify on the 28th August after review with our ED. This again will have to be delivered within 2 weeks. We will take care to limit it to a few additional requests.

Ok. Please let me know of the requests as soon as you learn what needs to be done. Thank you.

@Verena @WMDE-leszek

We will have further analytics requests within this project which I can specify on the 28th August after review with our ED. This again will have to be delivered within 2 weeks. We will take care to limit it to a few additional requests.

Do we have any further requests here or this can be resolved? Thanks.

No, we don't. Thank you for your work. I will close the ticket.