Using statistical analysis, Bayesian inference, machine learning, and software/data engineering to solve problems and inform decisions in Product Analytics
User Details
- User Since
- Jul 27 2015, 4:15 PM (487 w, 3 d)
- Availability
- Available
- IRC Nick
- bearloga
- LDAP User
- Bearloga
- MediaWiki User
- MPopov (WMF) [ Global Accounts ]
Mon, Nov 25
@Khantstop has reported that trying to kinit results in error
kinit: krb5_get_init_creds: unable to reach any KDC in realm WIKIPEDIA, tried 0 KDCs
Reopening, and I think this is now specifically for Data-Platform-SRE to assist with.
Thank you so much, Ben!!! Can confirm that mv .conda/pkgs/ .conda/pkgs-bak on stat1008–1011 fixed it.
Ooh, scratch that – as of T121646: Document usage policy for LocalStorage in MediaWiki code our localStorage wrapper has an expiry system https://doc.wikimedia.org/mediawiki-core/master/js/module-mediawiki.storage.html
Fri, Nov 22
By the way, if you haven't seen yet – Metrics Platform data contract now has experiments fragment (cf. T368326: Update Metrics Platform Client Libraries to accept experiment membership) that you are encouraged to use.
Thank you!
Thu, Nov 21
I wanna give this a try.
Wed, Nov 20
I think the underlying need is still there but the discussion of the topic has continued / been restarted in T361684: Create per-wiki user preference metrics
Hi @cchen if you're still the owner of the listed dashboards that needed to be deleted can you please delete them? For the other ones mentioned in the ticket can you please make sure @Mayakp.wiki so she can do whatever is appropriate?
Fri, Nov 15
Thu, Nov 14
Thank you, KC!
Wed, Nov 13
By the way airflow-analytics-product.wikimedia.org would work for us :)
Tue, Nov 12
Thu, Nov 7
Thu, Oct 31
Wed, Oct 30
Should we close this out given https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary/Retention_Rate (on-wiki doc in-progress; finished doc here)?
Thank you @Snwachukwu!
Oct 25 2024
Oct 24 2024
@Ottomata asked me to note that for Product Analytics it would be really helpful if the data was available in data lake and could be accessed/reported with Superset (which we know how to use).
Oct 17 2024
I approve membership in airflow-analytics-product-admins
Oct 16 2024
@Tgr @Dreamy_Jazz: I think it would be good to document the use case(s) that motivates adding these tables to the data lake. For example, T364398: Add MW table 'cu_log' to data lake is required for calculating some metrics related to Temp Accounts.
Oct 11 2024
Update: @Ahoelzl is going to sync with DPE about sqoop and will follow up after the group is aligned on making more MariaDB wiki replica data available in the Data Lake.
Oct 9 2024
@ifried: I am going to decline this task for now as we do not have capacity to report on everything requested on a monthly basis.
Sep 30 2024
Ah, thank you for the clarification!
Sep 28 2024
Thanks for checking in, @BTullis! I think at this point I'd rather wait until Positron comes out and try to get that in our system. Positron is currently in very early stages of development and is a multilingual, data science-focused IDE that's a successor to RStudio and is being developed by the same company.
Sep 27 2024
Sep 26 2024
Thank you for updating membership, @jijiki!
Sep 25 2024
Ah, that would explain both! And yes, import log (particularly from 2023-07-31) shows quite a few templates getting imported from Meta wiki.
What's going on with Wikistats and Wikifunctions specifically? T370551: Bug: Cassandra Unique Devices not loading Wikifunctions mobile data
Sep 24 2024
Update: the notebook and related code for reporting is now drafted.
That is correct – the original request should have been for airflow-analytics-product-admins, similar to T373194: Requesting access to airflow-analytics-product-admins for kcvelaga
Sep 20 2024
Sep 19 2024
Looks great, thank you @phuedx!
Sep 12 2024
Here's an example using pageviews tool of looking at traffic for:
- Community Wishlist/Intake
- Community Wishlist/Intake/zh
- Community Wishlist/Focus areas/Repetitive tasks
- Community Wishlist/Wishes/Link "diff" and "hist" for category changes on Watchlist RecentChanges
- Community Wishlist/Wishes/Link "diff" and "hist" for category changes on Watchlist RecentChanges/zh
- Community Wishlist/Wishes/Link "diff" and "hist" for category changes on Watchlist RecentChanges/en
Blocked because instrumentation implementation work (T370170) has been deprioritized while team focuses on addressing issues with core features.
Sep 11 2024
@phuedx asked "Is locking all of the inputs acceptable?"
Just checked with @phuedx and we're aligned on the terminology:
Sep 10 2024
Aug 23 2024
Approved
Aug 19 2024
not applicable, data scientist will check if tracking works
Instrumentation QA is two-step process. First, the engineer (whether the software engineer or a QTE) needs to QA the instrumentation as much as they can to make sure that it is producing the desired events (see https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Validate_Events, including that there aren't any validation errors).
Aug 16 2024
Will sample consistently if given the same starting value (e.g. if we're sampling on page ID, the same page ID will always return the same assignment).
Aug 15 2024
Aug 12 2024
Questions for @JTannerWMF & @SNowick_WMF:
Aug 8 2024
Hi @ifried! @Iflorez and I discussed your question yesterday and it's going to come down to how the feature actually works. Either:
- Scenario A: User can switch between the tabs seamlessly (powered by JS) without needing to reload the page.
- In this scenario there is only one pageview of this special page no matter how many times the user goes back and forth between the tabs.
- Scenario B: When page loads it checks for ?tab=Events (default) or ?tab=Communities and switching between them causes the page to reload with different URI query parameter.
- In this scenario there is one pageview for each time the user switches tabs.
Aug 7 2024
@mforns: I just discussed this with @KCVelaga_WMF and confirmed that multiple pieces of information will be "all manifest at the same time, atomically"
Aug 1 2024
Apologies @Tchanders – I created the placeholder ticket based on misunderstanding/misinterpreting the term "rollback" when Niharika's request came in to have a rollback metric (no other details given) so I read "rollback" as rolling back Temp Accounts, and I see now that a much different thing was intended.
@larissagaulia: Can you please check with your team if T52655: Allow adding canonical names for custom namespaces is indeed still a blocker for this or if it has been resolved, just not linked to the ticket?
Jul 31 2024
@cjming @Ottomata: Another negative to document (and think about): event sanitization. We can configure sanitization/retention policies on a per-instrument basis since they are different streams/tables, but with the monostream/monotable we would lose that flexibility. Without changing how the current sanitization pipeline works, we would have a single entry in the allowlist for the monotable. We would have to reconsider how we evaluate risk when it comes to retaining sanitized data longer than 90 days.
Jul 30 2024
How do we measure this in an automated/pipeline-able way without requiring manual data input?
@Niharika: Is this still needed for any decision making with regards to Temp Accounts implementation or can I decline this?