Wikifunctions:Status updates/2024-02-07
◀ | Wikifunctions Status updates | ▶ |
Quarterly planning
Two weeks ago we held an internal planning meeting to sketch out what we want to work on in the current quarter. Last week we collected and wrote down the results, and we want to share those results with you.
Our team's overarching goal is to work towards supporting the pieces needed for Abstract Wikipedia.
- Type support for Wikidata prototype. We want to add more types to Wikifunctions, in particular with an eye on the types needed to be able to integrate forms from lexicographic data on Wikidata. For this, we want to work on serializers and deserializers, renderers and parsers, as well as validators. The types we expect to enable are, besides the recently enabled lists, numbers, enumerations (for, e.g. grammatical features), and lexemes. We will work together with the community on the exact list of types to support, but in the end we hope to have all types in place so that in the next quarter we can work towards accessing lexemes from Wikidata.
- Public API for Wikifunctions. We want to encourage the creation of third-party apps using functions from Wikifunctions, such as the integration with Lucas Werkmeister’s Wikidata Lexeme Forms which is currently using an internal API. The first step is to design the API and decide on our approach.
- Simplify our end-to-end tests and improve the reliability. Our stack is a bit complicated compared to other systems at Wikimedia, running an evaluator and orchestrator services on the backend as well as the wiki itself with the Wikilambda extension, and relying on certain content in the wiki. Because of this, parts of our continuous integration test platform are currently not in a great shape. This task is about improving that situation and increasing our confidence in our tests.
- Research for calling Wikifunctions inside Wikipedia content. One major goal of our project is to allow the Wikipedia and other Wikimedia communities to call functions from Wikifunctions within wiki pages, like they do with Commons media files or local templates and Lua scripts. In order to prepare for this we will be starting user research to get a better understanding of user needs and expectations.
- Develop a baseline understanding of our users and their motivations. To make sure that we are developing Wikifunctions in a user-informed way, we want to make better analysis of who our current users are and why they are here based on the metrics we have so far.
- Establish a regular cadence of external communication. Our weekly newsletter is good to communicate with our immediate community, but we want to reach out to a wider audience as well. This will include regular posts on Diff.
- Develop a framework to measure the narrative difference between two Wikipedia articles in different languages. This is a research task that will take us on the path towards understanding how any two articles in two different languages about the same topic differ. Such a tool would help us with providing guidance for growing Abstract Wikipedia, and at the same time it could potentially provide us with an early warning system that might indicate a loss of diversity.
As this was our first planning effort for such a cadence, part of it is also learning to pick the right size of work. At the end of the quarter we will look back at what we actually achieved, and will learn from that for the next quarter.
Thank you, Quiddity!
Last week saw Nick Wilson’s last day on the Abstract Wikipedia team, before he heads into a well-deserved break. Nick may to some of you be better known under his wikinym “Quiddity”. Nick was part of the Abstract Wikipedia project from the very beginning, set up and organized the naming contest for Wikifunctions, and set up most of our communication structures.
Here are a few words by Nick as a good bye message:
Thank you, all!
It has been fascinating, humbling, motivating, and deeply inspiring to be a part of these complex projects. I've appreciated getting to know (or become more deeply familiar with) so many aspects of our movement and our mission, and more of the people that steadily help everything to move forwards.
As always, I wish I could clone myself in order to continue focusing on this project forever, in addition to working in new areas. I'm eagerly looking forward to a function-powered future, and I hope to continue helping in a volunteer capacity after I take a short wikibreak.
Luca, who joined us in early 2022, will continue in the role of Community Relations Specialist on the project and take over Nick’s responsibilities. Please join me in thanking Nick for his work!
Recent changes in the software
As discussed above, part of our bigger work this quarter is type support leading to Wikidata support. As part of this, we've fixed and now re-enabled validation for types (T352295), which shouldn't have any visible effect yet but will unlock the ability to tell the user their input is wrong when trying to use a type (such as saying that a natural number cannot have non-digits in it, or a date in February cannot be on the 30th day).
We've now made a small tweak to the display of strings returned from functions, wrapping them in double-quotes, which makes it possible to distinguish "" and " " as different responses. This does not fix all issues with spaces in inputs, as the nature of HTML makes some possible return values hard to cope with in a scalable way that doesn't break other use cases. We plan to come back to this area with further improvements, and would love your thoughts on improvements you'd like to see.
We've also fixed the interface breaking when using a language that MediaWiki doesn't support, such as https://www.wikifunctions.org/view/elx/Z10000 – previously you would end up in a broken state, but now things work as designed (T356428). This also means that if you try to use a language that even Wikifunctions doesn't know about, such as https://www.wikifunctions.org/view/hellothisisnotalanguage/Z10000, the system will fall back to English (like the rest of MediaWiki) rather than mysteriously providing a mixture of English and mostly bugs.
In smaller changes, one is a follow-up to last week's change to remove the noindex directive from our pages so that Google and other search engines see them (phab:T355441). Alongside the main links like https://wikifunctions.org/view/ar/Z801, we still have some URLs that also work due to how MediaWiki controls pages, like …/wiki/Z801?uselang=ar, or …/wiki/Special:ViewObject/ar/Z801; these will now no longer claim to be "canonical" (T355546). Another is that we've moved all the linked software help pages on MediaWiki.org to be in the same place, in the "Help:Wikifunctions" hierarchy. We've also made a tweak to the UX of the language selector on a page when it has a number of translations already – the padding is now in line with the rest of the form, instead of much wider (T355946).
We also added two new languages that MediaWiki supports: Ebira (Z1920) and Petjo (Z1921). These are added manually in production.
On the code side, we've made some minor improvements to our PHP code's coverage of the critical ZObject classes, edging up towards 100% (T302599). We've made a series of clean-ups to our front-end code following-on from previous work, removing unused components (T301868) and generally improving our data store code (T329107). This week is a "Fix It" week for the team, so we'll make further such improvements that will ship next week.
Function of the Week: is permutation
For this week’s Function of the Week I asked Nick for a favorite function, and he immediately answered that he loves lists and the unusual, and so he would like to choose an unusual function that deals with lists. Looking at the catalog of list functions, most of them were what you would expect from list operations, but one of them struck us as unusual in the sense that most programming languages would not have support for that functionality out of the box: is permutation (please feel free to correct us).
A permutation of a list is a second list that has exactly the same elements, but possibly in a different order. The function is permutation takes two lists and checks if they are permutations of each other. The function returns a Boolean value: true if the two lists are permutations of each other, and false if they are not.
In general, I have a bit of an issue with a function like this one: the function is defined on lists of objects, and not on lists of a specific type, such as a list of strings, or a list of Booleans. But not all types will necessarily have a function to check for equality, which will be a necessary prerequisite for this function to work. So this might lead to problems down the road. I am not sure how to handle this best.
The function currently has five tests: three pairs that are not permutations – (a, a, a) / (a, b, c), (b,c) / (a,b,c), and (b,b,a) / (a,a,b) – and two pairs that are permutations, (b,c,a) / (a,b,c) and (a,b,a) / (a,a,b). It’s great to see five tests! I would also go for some edge cases (what about empty lists?) and, given that the function is defined on lists of any type, also for lists that have more than single-character strings as elements. But again, yay for five tests! With five tests, the function is in the top 10% of functions per number of tests.
The function currently has three implementations, one in Python and two compositions. The Python implementation has two steps: it first does a short-cut, checking if the two lists have different lengths. If they do, it stops the implementation and returns False: two lists that have different lengths can never be permutations of each other. Once we know they have the same length, we compare whether the two lists, when sorted, are the same. If they are, the two lists are permutations of each other, if not, they are not. The sorting turns each of the input lists into a canonical version with regards to its elements that is independent of the order, which is why this implementation works. In theory, we could also drop the first step from this implementation, but having the check probably allows for some speed-up.
One of the compositions works by using supersets: we have a function, is superset, that checks whether one list is a superset of the other. The composition here uses is superset to check whether the first list is a superset of the second, and the second is a superset of the first. If both conditions are fulfilled (we check that by using and
), we know that both lists contain all the elements of the other list, and therefore contain the same elements, and thus they must be permutations of each other. Two of the tests time out, but unfortunately with an unusable error message. A bug has been reported (T356556).
The other composition is more elaborate:
- First it checks if the two lists have equal lengths. If not, it returns false.
- If they do, it checks if the first list is empty. If it is, it returns true (because we know that the second list also must be empty, because of the previous check, and empty lists are permutations of each other).
- Next, we recursively call is permutation itself. For the recursion step, we find and remove the first element of the second list from the first list, and compare that to the second list without the first element.
In this last step we remove the same element from both lists, and are thus left with two shorter lists. If those are permutations of each other, the two lists with the removed elements would also be permutations of each other, no matter where those two elements are added in those lists.
If the first element of the second list does not exist in the first list, the first list remains unchanged, a condition checked by this test on remove first matching element function.
The two compositions illustrate how different two compositions can be in terms of complexity: the composition using superset is a very simple composition, almost like a definition, combining just two existing functions. The other composition is much more elaborate, combining quite a number of different functions, using nested if conditions, and a recursive call. The latter composition is probably about half-way through the complexity level I would expect us to support for function compositions: I wouldn’t expect much more complex compositions than that in Wikifunctions at all. But, in the end, both compositions are great examples of how to use composition for creating higher-level functions.