Page MenuHomePhabricator

mpopov (Mikhail Popov)
Manager, Data Science

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jul 27 2015, 4:15 PM (487 w, 3 d)
Availability
Available
IRC Nick
bearloga
LDAP User
Bearloga
MediaWiki User
MPopov (WMF) [ Global Accounts ]

Using statistical analysis, Bayesian inference, machine learning, and software/data engineering to solve problems and inform decisions in Product Analytics

Recent Activity

Mon, Nov 25

mpopov updated the task description for T380800: Develop a Wikimedia-branded theme for use with Quarto.
Mon, Nov 25, 10:34 PM · Product-Analytics
mpopov moved T380800: Develop a Wikimedia-branded theme for use with Quarto from Triage to Backlog on the Product-Analytics board.
Mon, Nov 25, 10:24 PM · Product-Analytics
mpopov created T380800: Develop a Wikimedia-branded theme for use with Quarto.
Mon, Nov 25, 10:24 PM · Product-Analytics
mpopov added a comment to T286493: Investigate running Stan models on GPU.

Another possible path to investigate: PyMC with JAX backend on GPU

Mon, Nov 25, 10:00 PM · Product-Analytics
mpopov reopened T379303: Requesting access to analytics-privatedata-users group, sql_lab role, Kerberos Principal for Khantstop as "Open".

@Khantstop has reported that trying to kinit results in error

kinit: krb5_get_init_creds: unable to reach any KDC in realm WIKIPEDIA, tried 0 KDCs

Reopening, and I think this is now specifically for Data-Platform-SRE to assist with.

Mon, Nov 25, 9:46 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
mpopov updated the task description for T380525: Requesting access to deployment & stats private data access for jly.
Mon, Nov 25, 9:28 PM · Patch-For-Review, SRE, SRE-Access-Requests
mpopov added a comment to T380477: Jupyter/Conda: spawn new server with 'create and use new cloned env' times out.

Thank you so much, Ben!!! Can confirm that mv .conda/pkgs/ .conda/pkgs-bak on stat1008–1011 fixed it.

Mon, Nov 25, 4:55 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
mpopov created T380754: Fix pageviews datasets in Druid/Turnilo/Superset.
Mon, Nov 25, 2:52 PM · Movement-Insights, Data-Platform
mpopov added a comment to T380630: [Spike] Investigate Stable User Identification for Logged-Out A/B Testing.

Ooh, scratch that – as of T121646: Document usage policy for LocalStorage in MediaWiki code our localStorage wrapper has an expiry system https://doc.wikimedia.org/mediawiki-core/master/js/module-mediawiki.storage.html

Mon, Nov 25, 2:22 PM · FY2024-25 KR 3.1 Content Discovery, Web-Team, MinervaNeue, Spike

Fri, Nov 22

mpopov added a comment to T378115: Implement A/B test bucketing for mobile search recommendation.

By the way, if you haven't seen yet – Metrics Platform data contract now has experiments fragment (cf. T368326: Update Metrics Platform Client Libraries to accept experiment membership) that you are encouraged to use.

Fri, Nov 22, 6:15 PM · Web-Team, Web-Team-Backlog (FY2024-25 Q2 Sprint 4), FY2024-25 KR 3.1 Content Discovery
mpopov updated the task description for T286493: Investigate running Stan models on GPU.
Fri, Nov 22, 3:53 PM · Product-Analytics
mpopov removed a project from T286493: Investigate running Stan models on GPU: Analytics-Radar.
Fri, Nov 22, 3:09 PM · Product-Analytics
mpopov added a comment to T380593: Give Mikhail access to ml-labs.

Thank you!

Fri, Nov 22, 2:54 PM · Machine-Learning-Team
mpopov added a parent task for T380593: Give Mikhail access to ml-labs: T286493: Investigate running Stan models on GPU.
Fri, Nov 22, 2:53 PM · Machine-Learning-Team
mpopov added a subtask for T286493: Investigate running Stan models on GPU: T380593: Give Mikhail access to ml-labs.
Fri, Nov 22, 2:53 PM · Product-Analytics

Thu, Nov 21

mpopov updated the task description for T286493: Investigate running Stan models on GPU.
Thu, Nov 21, 3:35 PM · Product-Analytics
mpopov reopened T286493: Investigate running Stan models on GPU as "Open".

I wanna give this a try.

Thu, Nov 21, 3:31 PM · Product-Analytics
mpopov created T380477: Jupyter/Conda: spawn new server with 'create and use new cloned env' times out.
Thu, Nov 21, 3:13 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)

Wed, Nov 20

mpopov added a comment to T77058: Show Metrics Dashboard for Beta Features.

I think the underlying need is still there but the discussion of the topic has continued / been restarted in T361684: Create per-wiki user preference metrics

Wed, Nov 20, 4:46 PM · BetaFeatures
mpopov merged task T77058: Show Metrics Dashboard for Beta Features into T361684: Create per-wiki user preference metrics.
Wed, Nov 20, 4:43 PM · BetaFeatures
mpopov merged T77058: Show Metrics Dashboard for Beta Features into T361684: Create per-wiki user preference metrics.
Wed, Nov 20, 4:43 PM · Product-Analytics, Data Products
mpopov added a comment to T307969: Clean up Content & Topic dashboards in Superset.

Hi @cchen if you're still the owner of the listed dashboards that needed to be deleted can you please delete them? For the other ones mentioned in the ticket can you please make sure @Mayakp.wiki so she can do whatever is appropriate?

Wed, Nov 20, 4:31 PM · Movement-Insights

Fri, Nov 15

mpopov updated the task description for T372417: Switch from miniconda to miniforge.
Fri, Nov 15, 10:44 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Patch-For-Review
mpopov awarded T373435: [SPIKE] Establish a technical strategy for a standard, generalized Click Through Rate (CTR) instrument a Love token.
Fri, Nov 15, 10:39 PM · Data Products (Data Products Sprint 22), Patch-For-Review

Thu, Nov 14

mpopov awarded T379546: Update the product-analytics DAGs to use miniforge instead of condaforge a Stroopwafel token.
Thu, Nov 14, 7:42 PM · Product-Analytics (Kanban)
mpopov added a comment to T379546: Update the product-analytics DAGs to use miniforge instead of condaforge.

Thank you, KC!

Thu, Nov 14, 7:42 PM · Product-Analytics (Kanban)

Wed, Nov 13

mpopov added a comment to T378440: Migrate airflow-analytics-product instance webserver to kubernetes.

By the way airflow-analytics-product.wikimedia.org would work for us :)

Wed, Nov 13, 8:10 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
mpopov updated the task description for T362788: Migrate Airflow to the dse-k8s cluster.
Wed, Nov 13, 8:07 PM · Data-Platform-SRE, Epic

Tue, Nov 12

mpopov assigned T379546: Update the product-analytics DAGs to use miniforge instead of condaforge to KCVelaga_WMF.
Tue, Nov 12, 8:58 PM · Product-Analytics (Kanban)

Thu, Nov 7

mpopov added a comment to T372417: Switch from miniconda to miniforge.
Thu, Nov 7, 3:00 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Patch-For-Review

Thu, Oct 31

mpopov updated the task description for T243387: Analyse the time taken for publishing an article using translation.
Thu, Oct 31, 2:20 PM · LPL Analytics, Language-analytics, ContentTranslation
mpopov updated the task description for T243387: Analyse the time taken for publishing an article using translation.
Thu, Oct 31, 2:20 PM · LPL Analytics, Language-analytics, ContentTranslation

Wed, Oct 30

mpopov added a comment to T201501: Develop a framework for measuring user retention.

Should we close this out given https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary/Retention_Rate (on-wiki doc in-progress; finished doc here)?

Wed, Oct 30, 9:31 PM · Movement-Insights, Product-Analytics
mpopov added a comment to T364398: Add MW table 'cu_log' to data lake.

Thank you @Snwachukwu!

Wed, Oct 30, 7:29 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Temporary accounts, Data-Platform

Oct 25 2024

mpopov updated the task description for T378221: Requested input on analytical capabilities of GrowthBook Enterprise.
Oct 25 2024, 9:19 PM · Metrics Platform, Product-Analytics (Kanban)
mpopov moved T378221: Requested input on analytical capabilities of GrowthBook Enterprise from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Oct 25 2024, 8:30 PM · Metrics Platform, Product-Analytics (Kanban)
mpopov triaged T378221: Requested input on analytical capabilities of GrowthBook Enterprise as High priority.
Oct 25 2024, 8:06 PM · Metrics Platform, Product-Analytics (Kanban)
mpopov moved T371141: Analyze impact of Magru data center on unique devices in South America from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Oct 25 2024, 3:17 PM · Product-Analytics (Kanban)
mpopov edited projects for T371141: Analyze impact of Magru data center on unique devices in South America, added: Product-Analytics (Kanban); removed Product-Analytics.
Oct 25 2024, 3:17 PM · Product-Analytics (Kanban)

Oct 24 2024

mpopov added a comment to T355837: Add Prometheus support to statsd.js via mw.track().

@Ottomata asked me to note that for Product Analytics it would be really helpful if the data was available in data lake and could be accessed/reported with Superset (which we know how to use).

Oct 24 2024, 6:02 PM · MW-1.44-notes (1.44.0-wmf.5; 2024-11-25), MediaWiki-Platform-Team, MediaWiki-Engineering, Patch-For-Review, Event-Platform, Data-Engineering, Grafana, MediaWiki-extensions-WikimediaEvents, Observability-Metrics

Oct 17 2024

mpopov added a comment to T377490: Requesting access to airflow-analytics-product-admins for jebe.

I approve membership in airflow-analytics-product-admins

Oct 17 2024, 6:58 PM · Data-Platform-SRE (2024.10.19 - 2024.11.08), SRE, SRE-Access-Requests

Oct 16 2024

mpopov added a comment to T376752: Add cu_log_event and cu_private_event CheckUser tables to data lake.

@Tgr @Dreamy_Jazz: I think it would be good to document the use case(s) that motivates adding these tables to the data lake. For example, T364398: Add MW table 'cu_log' to data lake is required for calculating some metrics related to Temp Accounts.

Oct 16 2024, 6:43 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Privacy Engineering, CheckUser

Oct 11 2024

mpopov added a comment to T364398: Add MW table 'cu_log' to data lake.

Update: @Ahoelzl is going to sync with DPE about sqoop and will follow up after the group is aligned on making more MariaDB wiki replica data available in the Data Lake.

Oct 11 2024, 7:42 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Temporary accounts, Data-Platform

Oct 9 2024

mpopov closed T365292: Metrics to track - Invitation lists as Declined.

@ifried: I am going to decline this task for now as we do not have capacity to report on everything requested on a monthly basis.

Oct 9 2024, 7:12 PM · Product-Analytics, Campaigns-Product-Team, Event-Discovery
mpopov closed T365292: Metrics to track - Invitation lists, a subtask of T348779: [EPIC] Invitation Lists MVP, as Declined.
Oct 9 2024, 7:11 PM · Epic, Campaigns-Product-Team, Event-Discovery
mpopov moved T374940: Ensure performer attributes in schemas clarify if the user is a temporary account from Triage to Tracking on the Product-Analytics board.
Oct 9 2024, 1:54 PM · DPE Temporary Accounts, Event-Platform, Temporary accounts, Product-Analytics, Data-Platform, Data-Engineering

Sep 30 2024

mpopov closed T375887: Prepare Search Satisfaction for Temp Accounts as Invalid.

Ah, thank you for the clarification!

Sep 30 2024, 2:06 PM · Discovery-Search (Current work), Temporary accounts
mpopov closed T375887: Prepare Search Satisfaction for Temp Accounts, a subtask of T374942: [Epic] Update schemas and instrumentation code for temporary accounts, as Invalid.
Sep 30 2024, 2:06 PM · Trust and Safety Product Team, Product-Analytics, Data-Platform, Temporary accounts, Epic

Sep 28 2024

mpopov closed T190769: Notebook machine to double as RStudio Server? as Invalid.

Thanks for checking in, @BTullis! I think at this point I'd rather wait until Positron comes out and try to get that in our system. Positron is currently in very early stages of development and is a multilingual, data science-focused IDE that's a successor to RStudio and is being developed by the same company.

Sep 28 2024, 6:22 PM · Data-Engineering, Data-Engineering-Jupyter
mpopov closed T190769: Notebook machine to double as RStudio Server?, a subtask of T188275: Jupyter Notebooks TLC 2018-2019, as Invalid.
Sep 28 2024, 6:21 PM · Data-Engineering, Data-Engineering-Jupyter, Analytics
mpopov closed T190769: Notebook machine to double as RStudio Server?, a subtask of T224658: Newpyter - SWAP Juypter Rewrite, as Invalid.
Sep 28 2024, 6:21 PM · Analytics-Kanban, Analytics

Sep 27 2024

mpopov created T375887: Prepare Search Satisfaction for Temp Accounts.
Sep 27 2024, 2:51 PM · Discovery-Search (Current work), Temporary accounts

Sep 26 2024

mpopov closed T373379: Requesting access to airflow-analytics-product-admins group for jiawang as Resolved.

Thank you for updating membership, @jijiki!

Sep 26 2024, 4:22 PM · SRE, SRE-Access-Requests

Sep 25 2024

mpopov moved T365586: Measure user impact of using a central login / signup page from Triage to Backlog on the Product-Analytics board.
Sep 25 2024, 8:11 PM · Growth-Team, Product-Analytics, MediaWiki-Platform-Team, SUL3, MediaWiki-extensions-CentralAuth
mpopov placed T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata up for grabs.
Sep 25 2024, 7:14 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform
mpopov closed T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata as Resolved.
Sep 25 2024, 7:14 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform
mpopov added a comment to T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata.

Ah, that would explain both! And yes, import log (particularly from 2023-07-31) shows quite a few templates getting imported from Meta wiki.

Sep 25 2024, 6:55 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform
mpopov updated the task description for T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata.
Sep 25 2024, 3:50 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform
mpopov renamed T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata from Bug: Wikistats incorrectly reports editors for Wikifunctions to Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata.
Sep 25 2024, 3:04 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform
mpopov added a comment to T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata.

What's going on with Wikistats and Wikifunctions specifically? T370551: Bug: Cassandra Unique Devices not loading Wikifunctions mobile data

Sep 25 2024, 3:02 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform
mpopov created T375636: Bug: Wikistats incorrectly reports editors for Wikifunctions and Wikidata.
Sep 25 2024, 2:57 PM · Abstract Wikipedia team, Wikifunctions, Data-Platform

Sep 24 2024

mpopov added a comment to T365292: Metrics to track - Invitation lists.

Update: the notebook and related code for reporting is now drafted.

Sep 24 2024, 7:15 PM · Product-Analytics, Campaigns-Product-Team, Event-Discovery
mpopov added a comment to T373379: Requesting access to airflow-analytics-product-admins group for jiawang.

That is correct – the original request should have been for airflow-analytics-product-admins, similar to T373194: Requesting access to airflow-analytics-product-admins for kcvelaga

Sep 24 2024, 2:40 PM · SRE, SRE-Access-Requests
mpopov renamed T373379: Requesting access to airflow-analytics-product-admins group for jiawang from Requesting access to deployment group for jiawang to Requesting access to airflow-analytics-product-admins group for jiawang.
Sep 24 2024, 2:37 PM · SRE, SRE-Access-Requests

Sep 20 2024

mpopov awarded T374942: [Epic] Update schemas and instrumentation code for temporary accounts a Like token.
Sep 20 2024, 2:22 PM · Trust and Safety Product Team, Product-Analytics, Data-Platform, Temporary accounts, Epic

Sep 19 2024

mpopov closed T372108: Document desired properties of an enrollment sampling algorithm as Resolved.

Looks great, thank you @phuedx!

Sep 19 2024, 3:47 PM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform
mpopov closed T372108: Document desired properties of an enrollment sampling algorithm, a subtask of T374471: Decide which bucketing/variant assignment system should we use, as Resolved.
Sep 19 2024, 3:47 PM · Growth-Team (Current Sprint), GrowthExperiments-Community-Updates, GrowthExperiments-Homepage, WMF-SDS 2 Sprinthackular 2024

Sep 12 2024

mpopov added a comment to T370170: Implement instrumentation for Community Wishlist.

Here's an example using pageviews tool of looking at traffic for:

Sep 12 2024, 7:58 PM · Community-Tech, Community Wishlist
mpopov moved T368674: Instrumentation for Community Wishlist from Next 2 weeks to Blocked on the Product-Analytics (Kanban) board.

Blocked because instrumentation implementation work (T370170) has been deprioritized while team focuses on addressing issues with core features.

Sep 12 2024, 6:28 PM · Community-Tech, Community Wishlist, Product-Analytics (Kanban)
mpopov changed the status of T368674: Instrumentation for Community Wishlist from Open to Stalled.
Sep 12 2024, 6:27 PM · Community-Tech, Community Wishlist, Product-Analytics (Kanban)
mpopov changed the status of T368674: Instrumentation for Community Wishlist, a subtask of T372773: Reporting for Community Wishlist, from Open to Stalled.
Sep 12 2024, 6:26 PM · Community Wishlist, Product-Analytics, Community-Tech
mpopov updated the task description for T370170: Implement instrumentation for Community Wishlist.
Sep 12 2024, 6:24 PM · Community-Tech, Community Wishlist

Sep 11 2024

mpopov added a comment to T372108: Document desired properties of an enrollment sampling algorithm.

@phuedx asked "Is locking all of the inputs acceptable?"

Sep 11 2024, 11:31 AM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform
mpopov added a comment to T372108: Document desired properties of an enrollment sampling algorithm.

Just checked with @phuedx and we're aligned on the terminology:

Sep 11 2024, 11:21 AM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform

Sep 10 2024

mpopov updated the task description for T369488: Develop a unified Automoderator Activity Dashboard (v1).
Sep 10 2024, 4:28 PM · Product-Analytics (Kanban), Automoderator, Moderator-Tools-Team
mpopov added a comment to T370170: Implement instrumentation for Community Wishlist.

The pageview metrics in the analytics tool do not include users who have ad-blocks and are blocking client-side analytics; to have an accurate denominator for pageviews we need instrumented impressions, otherwise, it would give lower traffic/clicks than actual ones.

I was under the impression that it's actually the opposite. The pageviews pipeline is built from server-side web request logs, so installing ad blockers or what have you shouldn't prevent pageviews from being recorded. Meanwhile requests to intake-analytics.wikimedia.org (what we used to use?) are blocked by EasyList which is used by ad blockers and privacy extensions. EasyList maintainers don't appear to have have caught wind of i.e. meta.wikimedia.org/beacon/event yet, and hopefully it stays that way! I recall them being quite strict about what constitutes unwanted traffic even for 'legitimate analytics'.

Sep 10 2024, 2:44 PM · Community-Tech, Community Wishlist

Aug 23 2024

mpopov added a comment to T373194: Requesting access to airflow-analytics-product-admins for kcvelaga.

Approved

Aug 23 2024, 2:20 PM · Patch-For-Review, SRE, SRE-Access-Requests

Aug 19 2024

mpopov added a comment to T370170: Implement instrumentation for Community Wishlist.

@KSiebert

not applicable, data scientist will check if tracking works

Instrumentation QA is two-step process. First, the engineer (whether the software engineer or a QTE) needs to QA the instrumentation as much as they can to make sure that it is producing the desired events (see https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Validate_Events, including that there aren't any validation errors).

Aug 19 2024, 2:31 PM · Community-Tech, Community Wishlist
mpopov added a subtask for T372773: Reporting for Community Wishlist: T368674: Instrumentation for Community Wishlist.
Aug 19 2024, 2:23 PM · Community Wishlist, Product-Analytics, Community-Tech
mpopov added a parent task for T368674: Instrumentation for Community Wishlist: T372773: Reporting for Community Wishlist.
Aug 19 2024, 2:23 PM · Community-Tech, Community Wishlist, Product-Analytics (Kanban)
mpopov triaged T372773: Reporting for Community Wishlist as Medium priority.
Aug 19 2024, 2:23 PM · Community Wishlist, Product-Analytics, Community-Tech

Aug 16 2024

mpopov moved T372108: Document desired properties of an enrollment sampling algorithm from Doing to Needs Review on the Product-Analytics (Kanban) board.
Aug 16 2024, 7:40 PM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform
mpopov added a comment to T372108: Document desired properties of an enrollment sampling algorithm.

Will sample consistently if given the same starting value (e.g. if we're sampling on page ID, the same page ID will always return the same assignment).

Aug 16 2024, 7:35 PM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform

Aug 15 2024

mpopov updated the task description for T368674: Instrumentation for Community Wishlist.
Aug 15 2024, 6:40 PM · Community-Tech, Community Wishlist, Product-Analytics (Kanban)
mpopov updated the task description for T368674: Instrumentation for Community Wishlist.
Aug 15 2024, 6:40 PM · Community-Tech, Community Wishlist, Product-Analytics (Kanban)

Aug 12 2024

mpopov updated subscribers of T370117: [Epic] Recommend Articles in Search on Android App.

Questions for @JTannerWMF & @SNowick_WMF:

Aug 12 2024, 7:04 PM · Design, Wikimedia-Design, Wikipedia-Android-App-Backlog (Android Release - FY2024-25), FY2024-25 KR 3.1 Content Discovery, Epic

Aug 8 2024

mpopov moved T372108: Document desired properties of an enrollment sampling algorithm from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Aug 8 2024, 9:07 PM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform
mpopov triaged T372108: Document desired properties of an enrollment sampling algorithm as Medium priority.
Aug 8 2024, 9:07 PM · Data Products (Data Products Sprint 19), WMF-SDS 2 Sprinthackular 2024, Product-Analytics (Kanban), Metrics Platform
mpopov awarded T371373: airflow-dags: Mutualization of _IMPORTED flag sensors creations a Like token.
Aug 8 2024, 7:40 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
mpopov updated the task description for T367057: [SPIKE] Document decision to use a single table per base schema.
Aug 8 2024, 6:29 PM · Data Products (Data Products Sprint 17), Spike, Documentation, Metrics Platform
mpopov added a comment to T368303: REQUEST: Add Special:AllEvents to allowlist for campaigns-product pageview tracking.

Hi @ifried! @Iflorez and I discussed your question yesterday and it's going to come down to how the feature actually works. Either:

  • Scenario A: User can switch between the tabs seamlessly (powered by JS) without needing to reload the page.
    • In this scenario there is only one pageview of this special page no matter how many times the user goes back and forth between the tabs.
  • Scenario B: When page loads it checks for ?tab=Events (default) or ?tab=Communities and switching between them causes the page to reload with different URI query parameter.
    • In this scenario there is one pageview for each time the user switches tabs.
Aug 8 2024, 6:22 PM · Data Products (Data products Sprint 18), Event-Discovery, Data-Platform

Aug 7 2024

mpopov added a comment to T369687: Develop a reusable Metrics Platform schema fragment for translation workflows.

@mforns: I just discussed this with @KCVelaga_WMF and confirmed that multiple pieces of information will be "all manifest at the same time, atomically"

Aug 7 2024, 3:34 PM · Data Products (Data products Sprint 18), Product-Analytics, LPL Analytics

Aug 1 2024

mpopov added a comment to T371404: Measuring Edits Rollbacks.

Apologies @Tchanders – I created the placeholder ticket based on misunderstanding/misinterpreting the term "rollback" when Niharika's request came in to have a rollback metric (no other details given) so I read "rollback" as rolling back Temp Accounts, and I see now that a much different thing was intended.

Aug 1 2024, 6:05 PM · Product-Analytics (Kanban), Temporary accounts
mpopov updated the task description for T371404: Measuring Edits Rollbacks.
Aug 1 2024, 6:04 PM · Product-Analytics (Kanban), Temporary accounts
mpopov updated subscribers of T371560: REQUEST: A useful namespace_canonical_name column in wmf_raw.mediawiki_project_namespace_map.

@larissagaulia: Can you please check with your team if T52655: Allow adding canonical names for custom namespaces is indeed still a blocker for this or if it has been resolved, just not linked to the ticket?

Aug 1 2024, 2:02 PM · Analytics-Canonical-Data, Data-Platform

Jul 31 2024

mpopov created T371560: REQUEST: A useful namespace_canonical_name column in wmf_raw.mediawiki_project_namespace_map.
Jul 31 2024, 9:18 PM · Analytics-Canonical-Data, Data-Platform
mpopov added a comment to T367057: [SPIKE] Document decision to use a single table per base schema.

@cjming @Ottomata: Another negative to document (and think about): event sanitization. We can configure sanitization/retention policies on a per-instrument basis since they are different streams/tables, but with the monostream/monotable we would lose that flexibility. Without changing how the current sanitization pipeline works, we would have a single entry in the allowlist for the monotable. We would have to reconsider how we evaluate risk when it comes to retaining sanitized data longer than 90 days.

Jul 31 2024, 4:42 PM · Data Products (Data Products Sprint 17), Spike, Documentation, Metrics Platform

Jul 30 2024

mpopov added a comment to T366627: [MPIC] Analyse risk of potential performance issues with static approach to stream configuration.

@Ottomata @xcollazo: I can't review all the discussion on this ticket but Andrew pointed me here:

Jul 30 2024, 4:13 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Products, Metrics Platform
mpopov updated subscribers of T371404: Measuring Edits Rollbacks.

How do we measure this in an automated/pipeline-able way without requiring manual data input?

Jul 30 2024, 3:22 PM · Product-Analytics (Kanban), Temporary accounts
mpopov added a comment to T346466: Investigate how temporary accounts are logged and categorized.

@Niharika: Is this still needed for any decision making with regards to Temp Accounts implementation or can I decline this?

Jul 30 2024, 3:19 PM · Temporary accounts, Product-Analytics, Anti-Harassment