Talk:Abstract Wikipedia/Licensing discussion
Comments are welcome in any language.
We would like to invite the community to discuss these recommendations and hopefully to find consensus around the licensing decision. The goal is to keep the discussion open for about four weeks, and, if necessary, to extend and restructure it. In case this turns out to be insufficient to reach a consensus, we might restructure the licensing discussion to focus solely on Wikifunctions for now, and then follow up with a discussion about Abstract Wikipedia.
To guide the license choice, it may be useful to consider and discuss the following questions: What are the long-term objectives of the projects, and how can a copyright license support these objectives? Should the people involved in creating Abstract Content receive credit? How valuable is it to preserve consistency and compatibility with the licenses on Wikipedia?
The specific question to the community is whether to follow the given recommendations, or whether a different proposal would be more to the liking of the community. Particular questions of interest surround the licensing of Function Implementations, and the licensing of Abstract Content (and thus Output Content).
Even if you are just supporting the recommendations above, it would be great to see your voice expressed explicitly, in order to get a better understanding of what the community tends towards. If no rough consensus can be reached, we will organize a formal vote.
Comments
Views on the recommendations
- … Support Thadguidry (talk) 22:17, 22 November 2021 (UTC)
- Thadguidry, could you please clarify whether you support CC0 for Abstract content, or CC BY-SA, or either? (As currently framed, the "recommendations" allow either, so your "support" vote could be interpreted as support for either.) --Andreas JN466 06:56, 24 November 2021 (UTC)
- … Support Kasyap (talk) 06:06, 23 November 2021 (UTC)
- Kasyap, could you please clarify whether you support CC0 for Abstract content, or CC BY-SA, or either? (As currently framed, the "recommendations" allow either, so your "support" vote could be interpreted as support for either.) --Andreas JN466 06:56, 24 November 2021 (UTC)
- Support. Specifically in favor of CC0 for Abstract Wikipedia because it seems highly questionable to me whether most of its articles are copyrightable at all. —MisterSynergy (talk) 13:12, 24 November 2021 (UTC)
- Support CC0 for Function Signatures, Abstract content, and Output content. Maybe CC0 for Function implementation, to the extent that function implementation can itself be abstract content / output content of yet other functions. I am still thinking but I think this is my position. Blue Rasberry (talk) 15:34, 24 November 2021 (UTC)
- Support CC0 for function signatures, Apache for function implementations, CC BY-SA for both Abstract content and Output content.
- I am specifically convinced that, while the fact that Jupiter is a planet is not copyrightable, a construct that compares Jupiter to all other planets in the Solar System is sufficiently elaborate to make me think it would be more defendable with a CC BY-SA license, than a CC0. In this, I totally agree with the recommendations from the dev team. Please note that this, to me, is a non-negotiable baseline - I will support any proposal that grants more freedom than this (i.e. CC0 for abstract/output content), but will fight tooth and nail against any proposal that will grant less freedom than this. Sannita - not just another it.wiki sysop 20:01, 24 November 2021 (UTC)
- Oppose CC0 ist nicht viral und sollte deshalb insgesamt verboten werden. CC0 widerspricht den Grundgedanken freier Lizenzen, daß sie sich selbst verbreiten. Ralf Roletschek (talk) 09:49, 25 November 2021 (UTC)
- Oppose the use of CC0 for abstract content/output content. End users should always be able to see that the information comes from Wikimedia, and re-users should be forced to use free licences for derivative works. (See "The rise of the underdog" for a more detailed discussion.) / Dem Endnutzer sollte stets klar sein, dass die Information von Wikimedia kommt (Rückverfolgbarkeit), und Nachnutzer, die abgeleitete Werke erstellen, sollten gezwungen sein, ebenfalls eine freie Lizenz zu verwenden. Das ist bei CC0 nicht gegeben. --Andreas JN466 10:08, 25 November 2021 (UTC)
- The purpose of copyright is to protect your creative expression; it does not allow you to force users into exercising a particular data provenance protocol. —MisterSynergy (talk) 10:35, 25 November 2021 (UTC)
- CC BY-SA requires attribution. It's why there is a "Wikipedia" link in the Knowledge Graph panel, and why voice assistants tell you when they read out a Wikipedia article (if I recall correctly, it took some prodding from the WMF to make them do so). --Andreas JN466 10:53, 25 November 2021 (UTC)
- The purpose of copyright is to protect your creative expression; it does not allow you to force users into exercising a particular data provenance protocol. —MisterSynergy (talk) 10:35, 25 November 2021 (UTC)
- Support the combination of CC-0 (function signatures), Apache [or MIT or LGPL] (code), CC-BY-SA (abstract content and generated output); Oppose other variations. To preserve history of generated output, I would recommend a) adding a link to generated output that points to the abstract content used to generate it, b) clarifying in terms of use that for re-users, it is sufficient to link to the generated output alone.--Eloquence (talk) 20:26, 26 November 2021 (UTC)
- ...
Support but with modifications
- I support most of it, and I'm not necessarily asking for a modification. However, I'd love to see at least an explanation for the choice of a permissive license for the implementation. See a more detailed question below. --Amir E. Aharoni (talk) 12:26, 23 November 2021 (UTC)
- I support the output content being licensed CC BY-SA, as long as we have a description of how attribution will work *and* an assurance that the output content CC BY-SA can be maintained *in practice*. What I'm worried about is a situation such as the following:
- A big tech actor takes the abstract content and the "lower level" software without attribution and then runs the software with Wikidata and the Abstract Content to get the same Output Content the Wikipedia might use, without any attribution required. Thus Wikipedia would have to attribute, but big tech would not have to attribute the same content. Can you reassure us that such a situation would not arise, and if it did that WMF legal would go to court to enforce the CC BY-SA license? I'll note that there are lots of cases of Wikipedia content being used without attribution, but no legal action is taken to enforce the license. Would the same situation arise here in practice? Smallbones (talk) 15:12, 24 November 2021 (UTC)
- Regarding non-compliance with the license violations, I want to refer to this section on English Wikipedia. Given that, do you still hold your recommendation for CC BY-SA, or do you prefer another license? --DVrandecic (WMF) (talk) 21:29, 25 November 2021 (UTC)
- Well, I do know for a fact that the WMF spoke to Amazon about their lack of attribution in Alexa – with, I believe, a positive result in the end. Admittedly though not the same as going to court. --Andreas JN466 00:36, 26 November 2021 (UTC)
- Regarding non-compliance with the license violations, I want to refer to this section on English Wikipedia. Given that, do you still hold your recommendation for CC BY-SA, or do you prefer another license? --DVrandecic (WMF) (talk) 21:29, 25 November 2021 (UTC)
- In general terms, I would adopt cc0 for data and functions, and cc-by-sa for the generated content. -Theklan (talk) 16:51, 24 November 2021 (UTC)
- Only support CC BY-SA for Abstract/output content, object to CC0 for Abstract/output content. Would prefer GPL to Apache. A key point to me is that re-users should have to indicate provenance. If lots of voice assistants, info panels etc. all end up agreeing with each other, end users should be able to see that this is because they all draw on the same Wikimedia source, and not because the entire world is in agreement. The other key point is that re-users should be forced to use the same licence for derivative works rather than converting free (gratis and libre) volunteer work into proprietary works. --Andreas JN466 18:29, 24 November 2021 (UTC)
- Support CC BY-SA and GPL licenses, as appropriate. I wrote my reasoning in the section below.--Strainu (talk) 19:40, 24 November 2021 (UTC)
Questions and clarifications
I hardly know what this discussion means.
- If this same content were coming from the most closed and restricted corporate actor, then what licenses would be applied to the various components? I want to know the precedent in the field.
- Can you show a third-party non-wiki example of these components in action anywhere, and also describe licenses in place for the parts
- What professional field produces this kind of content? Is this software, math notation, creative text, or what?
- Do different components routinely have different licenses?
- For which components, if any, are there past claims that they are uncopyrightable?
- Is this more of a casual wiki licensing decision, or is this a situation where the Wikipedia platform is an Internet giant and what we decide may potential influence a generation of development in a field?
- "Wikimedia Foundation's submission in response, we explained that AI algorithms should be treated like any other software tool and that the tool's user should be considered the copyright holder" - link to that please. That seems ethical and social position which needs to come from organized and informed Wikimedia community discussion, and not from a legal opinion. In the Wikipedia ecosystem a claim like that seems like it should come from discourse rather than from an office or role of designated authority. I am not aware of any past such community conversation.
Thanks. Blue Rasberry (talk) 00:07, 23 November 2021 (UTC)
- Hi, I will try my best to answer the questions, but note that I am not a lawyer. Also, in order to speed up the communication, I did not go through checking my answers by multiple departments etc., so please bear with me, I might make mistakes, have forgotten some things, and might not be aware of everything.
- I am not sure what you mean with precedent in this field, sorry. To the best of my knowledge we are the first large scale system aiming to allow a community to collaborate on and create multilingual content and generate language using a symbolic approach to make this content available to many. Other systems have been doing some parts of these, such as the Canadian weather service which has been using NLG technology for a long time in order to publish weather reports, etc. I do not know what kind of licenses they have used for their components. If I were to imagine the most closed and restricted actor, I would easily imagine that they don't publish the code at all, that they don't publish the abstract content at all, and that they merely publish the resulting natural language text under the normal rules of copyright. I hope that answers the question.
- No, I am not aware of any such system.
- I am not aware of any professional field producing text in many languages using a similar system.
- Yes. Think about Wikimedia in general: Wikipedia's text is published under CC BY-SA, Wikidata under CC 0, our software under various licenses (e.g. MediaWiki under GPL), and the images on Commons are also using a large number of licenses.
- The text by the Legal department discusses this, in particular for facts and for APIs.
- It is hard to predict the future. Wikidata's licensing decision has been used in several external places as an argument in favor of using CC 0 in other places too. Whether the decision for licensing the components of Wikifunctions and Abstract Wikipedia will have a similar effect or not remains to be seen.
- Here is the link to the document which has also been added into the project page by Quiddity.
- Thanks for asking! DVrandecic (WMF) (talk) 22:23, 23 November 2021 (UTC)
- @DVrandecic (WMF): Thank you for the decisive answers. I now have a better understanding that the choices we have to make are new. I was missing that context. This is helpful, I will think more. Blue Rasberry (talk) 22:40, 23 November 2021 (UTC)
Discussions
- How big of a decision is this, and how much can our decision affect future Wikimedia activities, organization, and expenses? If this is a relatively small Wikimedia community decision, then a typical talk page chat like this one is sufficient. If there are big issues here then perhaps we have a bigger conversation. I do not have a good understanding of the potential impact of the answers we have here, but I do have some intuition that the decisions made here will have effects at scale, and potentially apply licenses to much or almost all of the Wikimedia Movement's future output. I neither want to be hasty in setting standards, nor do I want to delay this discussion too much. If this really is a big decision, I would like diverse participation in the discussion. The Wikimedia Foundation is investing in this tool and while I trust the WMF for what it is, that organization does not reflect the diversity of our movement, and also it has a tendency to adopt practices which align with the commercial tech industry when there are other options. Again, I lack understanding of how important the decision here is, but I do feel hesitation when a big decision is put on the table and I cannot recognize who would benefit from the various choices that can be made. I think there is not supposed to be a choice here at all, but rather just a description of current perspectives, but we already know that there is a lot of diversity in licensing beliefs, laws, and actual user practices. Those things already are not aligned with each other.
- I would be more happy here the more that I thought that the outcome of this discussion had informed Wikimedia community participation within it.
- Big thanks to everyone who put this discussion together. This is a great topic to discuss. Blue Rasberry (talk) 22:57, 23 November 2021 (UTC)
- @Bluerasberry: I agree, I would love to see wide participation when discussing this question. Then again, I also understand that the discussion is abstract and about a difficult topic with many subtleties. I don't think it is easy to know and understand all the ramifications of the different decisions. I don't claim to do so myself, which is why we are having this conversation, to make the best possible decision.
- You write that a bigger issue needs a bigger conversation than a talk page. What do you mean with that? We have traditionally had very big decisions on talk pages, and I thought that's one of our strengths.
- On the one side, I do not think that the impact of the licensing decision for the code in Wikifunctions is as big as you describe: it would not apply licenses to all of the movement's future output. Function application to content from other sources, in general, would and should have no effect on the license of the output. If you put in a copyrighted text into a "capitalize" function, the result remains copyrighted text - the license of the code for this function that was applied did not make any difference.
- On the other side, the decision about the license of Abstract Content might have an impact on Wikipedia and how the content of Wikipedia can be reused. For example, if we choose a CC 0 license for Abstract Content we would now offer an alternative text to the handwritten text, without the restrictions of CC BY-SA.
- Ultimately, we believe that the copyright licenses used by the Wikimedia projects should empower as many users as possible to easily collaborate by sharing knowledge (see also the legal note on the use of CC ).
- It would be extremely helpful for me to understand which questions in particular you have. Then I can try to give some thoughts about them. Also, the page also offers some questions which might help us guide this discussion, in case the big picture gets too big. Thanks for clarifying! --DVrandecic (WMF) (talk) 00:13, 24 November 2021 (UTC)
- I do not know what I think, except that I am unsure how big of a decision this is. I am going to take a break from this conversation after making this post, but thanks if you have any brief response.
- Denny: you say "-the licensing decision for the code in Wikifunctions ... not apply licenses to all of the movement's future output", and also " if we choose a CC 0 license for Abstract Content we would now offer an alternative text to the handwritten text, without the restrictions of CC BY-SA".
- To me, this sounds like using data science to generate content for humans to read. I think this has the potential to quickly account for 50%+ of Wikimedia content, and possibly ~99%+. The idea is to convert Wikipedia's content to uncopyrightable statements, then use some algorithm to present the facts in every language for every reader.
- Denny, you say "I do not think that the impact of the licensing decision for the code in Wikifunctions is as big as you describe".
- If we have this process for extracting meaning from copyrighted Wikipedia text and converting it to uncopyrightable info, then when does our process get compared with anyone else doing the same thing to other media, like news, journalism, research, or novels? Can the same process convert scramble a Harry Potter book into a public domain approximation with renamed characters and changed situations? What will Disney's lawyers say when someone copies the Wikimedia community process and does approximate movie remakes of their content? I know there are lots of experiments out there with students and hobbyists, but when Wikipedia takes a position on this, then that seems like a milestone.
- Whatever the outcome of this, I want it to advocate for the community. The Wikimedia community will choose the option which advances the Wikimedia Mission. To make an informed choice, I wish that I better understood what corporate interests want here. You seem to be saying that corporate advocates have not yet taken a position. If corporations have, then I do not want anyone normalizing their position yet before the Wikimedia community has its own independent discussion on the matter.
- Questions:
- What percentage of Wikimedia readers will have a significantly different user experience as a result of this decision? I think the answer is ~100%
- What percentage of Wikimedia content will be modified by this process? I think the answer is ~100%
- Who is going to talk about what the Wikimedia community decides here? A lot of people, including outside the wiki community
- I will answer the questions on the main page with my opinion and position.
- What are the long-term objectives of the projects, and how can a copyright license support these objectives?
- The goal is a next-generation increase in the amount, quality, and production speed of Wikimedia content. Copyright licenses are inherently designed for the benefit of corporations and commercialization, and only the unusual activist licenses like those of Creative Commons can favor communities and public benefit. Almost no Wikimedia contributors make actionable claims on copyright with their contributions, and almost all Wikimedia community members would like to see increased use of free and open licensing if not government regulation compelling less corporate copyright protection for global community benefit. There are years of precedent of license-violating re-use of user contributions to Wikipedia, and none among the Wikimedia Foundation, Creative Commons, or existing community organizations have ever been able to defend content creators effectively from inappropriate content use through the Wikimedia platform. For the technical components which this project describes, unchecked inappropriate reuse is even easier and more likely. Copyright protection will not favor Wikimedia community members in this case but will favor corporate actors. I do not wish to adopt the same idealistic myths of user protection for functions and their output, because those myths favor corporations and not the community.
- Should the people involved in creating Abstract Content receive credit?
- No. When communities enter the copyright game they cannot compete with corporate players and commercialization, and business interests will overwhelm community contributions. The precedent to make here is not an internal Wikimedia community decision, but the demonstration that if Wikimedia projects take a position then that position is a societal norm which should apply even outside of the Wikimedia platform. Sooner or later corporations are going to take a position on this issue, and when they start suing each other and the general public, the courts are going to ask what the precedent is. If Wikimedia projects have already established a precedent which normalizes advocating for community and public benefit, then corporate interests will have much more difficulty in claiming that other positions are natural. The community has a lot to gain by normalizing its own claims and practices that AI generated media and as much of the related machinery as possible is ineligible for copyright. These claims are reasonable and sensible as well, even though soon corporate propaganda will claim otherwise in an attempt to capture and privatize the public commons.
- How valuable is it to preserve consistency and compatibility with the licenses on Wikipedia?
- It is only valuable if that past consistency supports Wikimedia community plans for the future.
- When Wikipedia was established the conversation on licensing was less developed. Activists are more experienced now, understand free and open media better, and were under less corporate competition. The discussion now matters more and is more informed than past discussions.
- Licenses with Wikipedia have changed in the past. Originally Wikipedia used GNU Public Document Licenses for copyright. Somewhere along the way, the Wikimedia Foundation changed the copyright text from saying that Wikipedia text was available with either CC licenses or GPDL to "Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply". We used to say that we had to mention the GDPL as terms of the license, but now that license commitment is fulfilled by saying "additional terms may apply". The social context for change was that some people wanted to distance themselves from the GPDL for whatever reasons. This did not happen with a broad discussion or published legal opinions. It just happened because enough people wanted it and no one objected. Likewise, Abstract Wikipedia can do new things and break precedent also. A similar conversation is happening with a transition from CC 3.0 to 4.0; the 3.0 license does not describe updates while the 4.0 license does, and there is discussion about transitioning licenses somehow through some process, converting some 3.0 licenses to 4.0. Somehow interpretations change with culture and time and what seemed impossible can just change.
- Rules of Wikimedia projects exist to serve the community. The community does not exist to serve the rules. When making big decisions, we do not need to assume that all precedents are a divine consensus from infallible early commentators. There are some guiding principles we carry forth, like the fifth pillar which says "ignore all rules", and which we can apply to say that if the community wants a change then we can enact it.
- What are the long-term objectives of the projects, and how can a copyright license support these objectives?
- thanks Blue Rasberry (talk) 15:30, 24 November 2021 (UTC)
Alternate proposals
- Both CC0 and CC-BY-SA SHOULD be supported for Abstract Content - Regardless Wikidata material about real-world entities, languages etc. MUST stay CC0 only
- It SHOULD be possible to publish Abstract Content and Output Content under either of CC0 and CC-BY-SA, depending on what the user chooses. If CC0 licensing is supported, content that is so licensed can and SHOULD be stored in Wikidata (subject to agreement by that project). One could expect that this would mostly involve either statements of fact that are not feasible for the current Wikidata model, but where the choice of some peculiar expression or presentation is largely immaterial, or literal, verbatim "translations" of claims, quotations, texts etc. from pre-existing sources in the public domain.
- A dedicated Abstract Wikipedia site, probably comparable to the current Wikipedia Commons, would then be used to work on CC-BY-SA content, including content that's imported from other Wikimedia projects. Regardless, CC-BY-SA -licensed formal content about real-world entities (including lexical and/or grammatical data about languages) MUST NOT be published on Wikidata and/or commingled with the existing CC0 material - this would make Wikidata licensing legally uncertain which would destroy much of its current usefulness to outside reusers! Hupaleju (talk)
Objections to the recommendations
- Object to CC0 for Abstract content/text output. A key point to me is that re-users should have to indicate provenance. If lots of voice assistants, info panels etc. all end up agreeing with each other, end users should be able to see that this is because they all draw on the same Wikimedia source, and not because the entire world is in agreement. Another key point to me is that re-users should be forced to use the same licence for derivative works rather than converting free (gratis and libre) volunteer work into proprietary works. --Andreas JN466 18:43, 24 November 2021 (UTC)
FAQs - Questions and answers
Some questions that showed up during the internal discussion and their answers.
- From a licensing perspective, does text generated with AW differ from text directly contributed by the community?
- That's exactly the question we have to answer with this discussion. Directly contributed text on the Wikipedias is published under a CC BY-SA license. Legal recommends that CC BY-SA would be an option for the Abstract Content and thus for the Output Content generated from the Abstract Content, so it may be the same if we decide so. But it could also be a different license, in particular CC0 or CC BY.
- Is there any way to ensure that a given language's community has the final word on what makes it into an article?
- Yes! Albeit not legally but technically: every Wikipedia language community will always have the opportunity to locally override Output Content coming from Abstract Wikipedia (or even to not enable it in the first place). The local communities retain the final decisions on the Wikipedia articles.
- Will the list of contributors to an Abstract Wikipedia article be accessible via a "History" tab within each Wikipedia page where it is used?
- It depends. There are (at least) two ways to create articles from Abstract Wikipedia: either by having individual function calls in the body of the article in the language Wikipedia, or by having the whole article created from the abstract content in Abstract Wikipedia. Regarding the first way, see below. In the second case: yes, we can link to the history of the abstract content easily.
- How will the "History" tab work if there is partial AW-content & partial local-content?
- The History tab does not currently include changes to imported content, be it through changes in templates, transcluded pages, media in Commons, etc. So if we wanted to do that that would be a bigger change throughout the ecosystem. Recent Changes though could easily integrate such changes.
- What about via the API if someone is crawling or spidering Wikipedia?
- For content or history? For history, see above - we usually do not do that currently with any form of included content. For content, depends on the exact API, whether one is asking for the raw content (in which case, no), or for the rendered content (in which case, yes).
For Function Implementation, consider following Wikimedia Commons
Hi, with the "Function Implementation" section, I'd recommend taking a similar approach to Commons - see commons:Commons:Licensing - i.e., allow all free licenses, as long as they meet certain requirements. Commons isn't mentioned in this section, so I'm not sure if this has been overlooked or considered but dismissed, but this seems to be a good way to store multiple smaller works, since it maximises the number that can be stored without causing license problems. Probably there are issues with combining code with multiple licenses together, but that same thing happens when you create a compilation of Commons media, so presumably it's manageable. If there are specific issues with individual licenses (e.g., GFDL for images), then add exceptions for them like commons:Commons:Licensing#GNU_Free_Documentation_License. Moreover, have a community process that lets you decide which licenses are acceptable - including new licenses that might appear in the future. Thanks. Mike Peel (talk) 07:52, 23 November 2021 (UTC)
- Thanks for the comment, Mike! Legal indeed outlined that possibility, and we are thinking about it. My current thinking is to launch with only a single license, and then see how things work out - and then, once we have more data and can see whether there is a need for other licenses, to implement support for multiple licenses if that would help the project. But I would like to hear arguments for and against using multiple licenses, and also if that's a requirement pre-launch or if that can wait. --DVrandecic (WMF) (talk) 22:40, 23 November 2021 (UTC)
Why permissive and not copyleft?
Most of the recommendation make sense to me, but why is the Apache license suggested for the implementation and not a copyleft license, such as GPL or AGPL? Why is it good for the license to be permissive? Won't there be advantages for using a stricter copyleft license to protect the usage?
AGPL, in particular, was in the news recently. I'm not a lawyer, and I'm not sure whether such a thing is useful for WikiFunctions, but my intuition tells me that it's a good thing to consider.
At the very least, the choice of a permissive license must be substantiated and not just declared. Amir E. Aharoni (talk) 12:24, 23 November 2021 (UTC)
- Hi Amir! In the section on Software, legal explains that their recommendation follows current best practices for our software development. Most of our software work is published under less restrictive licenses than the GPL, with some notable exceptions (such as MediaWiki itself). The recommendation is to follow our current best practices. Note that this is not a declaration, but a recommendation with a discussion, which is what we are doing here right now. --DVrandecic (WMF) (talk) 22:40, 23 November 2021 (UTC)
- MediaWiki itself is kind of a big thing. Most MediaWiki extensions are GPL, too (although notably, WikiLambda is MIT; why?). The content of our projects is CC-BY-SA, which is also copyleft.
- The reason for publishing the iOS app under a non-copyleft license is Apple's App Store policies, which are not friendly to copyleft, but the iOS app is far from being the most important thing in our universe, and it doesn't represent all the best practices.
- So just saying that permissive licenses are our current "best practices" is quite unclear, and it's not a good enough explanation. Amir E. Aharoni (talk) 06:14, 25 November 2021 (UTC)
- I would also find it more consistent to have a copyleft license for the function implementations. I understand from the previous answer that most of software work of the Wikimedia Foundation uses permissive licenses, but I'd argue MediaWiki itself is the exception that counts, and, most importantly, all the contributed content is currently copyleft. This means that the information remains free, even taken outside of Wikipedia and modified; and this is desirable. I don't think the function implementations should be handled differently. A copyfree license would mean that someone could modify the content, modify the function implementations, not provide the new implementation, effectively making the new content potentially very hard to use. I think the ideal license would be a dual (A)GPL + CC-BY-SA license, so we are sure about the compatibility, and we still allow re-using the functions for other things outside Wikipedia (maybe translated into another programming language), while guaranteeing that the modified functions will still be available. Copyleft is how Linux and many projects have thrived, including Wikipedia itself. I'm afraid a copyfree license for parts of the contributed content could weaken Wikipedia. The other licensing decisions otherwise look good to me --Raphael.jakse (talk) 07:33, 24 November 2021 (UTC)
License for other templates, modules, and gadgets
It's not exactly on the topic of WikiFunctions, but it's strongly related: Now that we're finally discussing the WikiFunctions license, can this be also an opportunity to discuss the license for the hundreds of thousands of templates, modules, and gadgets on the sites? They are much more similar to code than to content, but by default the same content license applies to them. Is it good? Perhaps we could give the editors a choice to apply GPL, Apache, BSD, CC0, CC-BY-SA or some other license to the templates, modules, and gadgets they write? It's probably possible to do it using code comments, edit summaries, or talk pages, but it would be nice to have something more structured, for example a dropdown next to the Publish button. Amir E. Aharoni (talk) 12:30, 23 November 2021 (UTC)
- Hi Amir! I think it would be great to have a discussion about the licensing of the Lua code published on the various projects, of the gadgets, and all the other code that is all over the projects. Personally, I am not sure whether CC BY-SA is a great choice for that, as it is a license that is not meant for source code. But I think that the discussion we are aiming for here right now is already complex enough as it is, and would invite that question to be tackled independently as I don’t see much of a dependency between these questions. --DVrandecic (WMF) (talk) 22:43, 23 November 2021 (UTC)
Please clarify which recommendation for Abstract content you would like people to support
The "overview and request" section says "We recommend to publish Abstract Content under the same license as the content of Wikipedia, i.e. CC BY-SA", but the two sections further down detailing the actual recommendations by the legal department and the development team say either CC0 or CC BY-SA might be good. The overview needs to be made consistent with the recommendations somehow, otherwise it won't be clear what people supported or commented on. --Andreas JN466 07:38, 24 November 2021 (UTC)
- I've edited the "Overview and request" section to make it consistent with the actual recommendations. See [1] --Andreas JN466 20:51, 24 November 2021 (UTC)
- Hi Andreas! I am sorry you found the text confusing. The overview stated "We recommend to publish Abstract Content under the same license as the content of Wikipedia, i.e. CC BY-SA." until you changed it. The recommendation by the development team also states "The development team further recommends to choose CC BY-SA for Abstract Content and Output Content.", and then follows up with a few paragraphs of text explaining why we prefer CC BY-SA. I have re-read the text, and whereas you claim that the development team recommends CC-0 for Abstract Content, I can not find such a sentence (and it would be wrong). Please let me know what led you to that reading so we can fix it. Because of that I am reverting your change to the summary. --DVrandecic (WMF) (talk) 22:19, 24 November 2021 (UTC)
- @DVrandecic (WMF): Well, then you won't revert this edit, correct? Because that literally said that the team recommends "to choose either CC0 or CC BY-SA for the Abstract Content and the Output Content." And if the wording after this edit of mine now indeed reflects your recommendation – which I would welcome – then you must also say that you differ from the recommendation of the legal department, because the legal department's recommendation is, and I quote, "Recommendation: Abstract content should be licensed under CC BY-SA or CC0."
- So, do you agree with the legal department that CC0 would be good, or do you disagree and recommend CC BY-SA only? Please be clear. Right now you're hedging, because the last paragraph of the team's recommendations brings in CC0 through the backdoor, where you say, On the other hand, one could argue that by putting Abstract Content under CC0 we open the space for a larger amount of possible reuse in applications we cannot even imagine yet, nevermind properly decide the legal framework for. CC0 most certainly allows the most freedom in the reuse of the Abstract Content. So right now, the section is written in such a way that agreeing with you means agreeing that abstract content "should be CC BY-SA and not CC0, but could also be CC0". In other words, agreeing with you right now, as the section stands, gives you carte blanche to do whatever you like later on, and claim you had community support for your position. Regards, --Andreas JN466 22:44, 24 November 2021 (UTC)
- I am sorry that the development team's recommendation is still confusing to you. The first paragraph says that the development team agrees with the recommendation by legal, and then reiterates said recommendation - which is that the Abstract Content should be published under CC 0 or CC BY-SA.
- The third paragraph then reads "The development team further recommends to choose CC BY-SA for Abstract Content and Output Content." That does not mean we differ from the recommendation by legal, but we follow it and we further specify the option CC BY-SA over the option CC 0 (because legal's recommendation says to choose either one or the other). But I consider that recommendation to be fully aligned with legal's recommendation. We are just more specific by choosing one of the two options.
- Your edit though makes it sound as if the legal team has only recommended to use CC BY-SA, which it has not. It said that it is one of two options. I am afraid your edit is misrepresenting the recommendation by the legal team. I am rewriting that to make it clearer that this is the summary of legal.
- Since you are asking explicitly: I agree with the legal department that CC 0 would be a good option. I also think that CC BY-SA would be an even better option.
- I am not sure how to state it more explicit which license the development team recommends for Abstract Content but with the sentence "The development team ... recommends to choose CC BY-SA for Abstract Content ..." and to have that also in the overview stated as "We recommend to publish Abstract Content under ... CC BY-SA.". I think both sentences are rather clear. I wonder if anyone else is also confused about what our recommendation is.
- And yes, we list a few arguments for both licenses, in order to get the ball rolling. I think that both licenses would have interesting consequences, and we will implement either of these licenses, whatever the community chooses. --DVrandecic (WMF) (talk) 21:32, 25 November 2021 (UTC)
- Well, that wording is better. You now make clear in the first paragraph that you are summarising Legal's recommendation, whereas previously you presented what Legal said as (also) the development team's own recommendation. You said, right in the first paragraph:
- The development team recommends to follow the recommendations by Legal, i.e. to choose CC0 as the license for Function Signatures; Apache for Function Implementations (and to start with a single license, and only when we recognize the need for multiple licenses to extend Wikifunctions to support multiple licenses); to choose either CC0 or CC BY-SA for the Abstract Content and the Output Content.
- If you still do not see the problem, could you ask someone outside WMF to read that sentence and then tell you what they understand the dev team's recommendation for abstract content is, based on that sentence?
- And even though the wording is now better – thank you for that! – there is still a problem. What I have objected to, from the beginning, is that from your "Overview and request" it is not even apparent to the casual bystander that CC0 is an option here – an option that is prominently featured in both your recommendation and even more so in Legal's recommendation. As I've said before, when I first read the overview, I thought, "There is nothing to see here. CC BY-SA – yay!" And then, when I read the rest of the page the next day, incl. that sentence above, I could hardly believe my eyes. I wonder how many people have read your overview since your announcement and stopped there, thinking, as I did at first, that there is no controversy here, no need to even get involved in the discussion. (In fact, people on Facebook did say exactly that.)
- Are you okay with people being misled like that by your overview? Is it your intention to keep people like Ralf above, who have a fundamental issue with CC0, away from a discussion which might result in CC0 being adopted? If not, then I'd ask that you kindly rework your overview accordingly, so people get an accurate idea of the discussion's scope.
- Incidentally, it would also help to clarify who "We" is in the "Overview and request", given that you have recommendations from both Legal and the development team on the page. By the same token, it is not clear what "these recommendations" in the "Overview and request" actually refers to. Does it refer to the preceding paragraph? Does it refer to the recommendations from both Legal and the development team given further below? --Andreas JN466 23:29, 25 November 2021 (UTC)
- Well, that wording is better. You now make clear in the first paragraph that you are summarising Legal's recommendation, whereas previously you presented what Legal said as (also) the development team's own recommendation. You said, right in the first paragraph:
We've been here before
I'm reminded of the early licence discussions for Wikidata – which, as we all know, ended up importing masses of data from CC BY-SA Wikipedia and republishing them under CC 0, even though this was categorically ruled out in the early stages:
Alexrk2, it is absolutely true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will ''provide'' data that can be reused in the Wikipedias. And a CC0 source can be used by a Share-Alike project, be it either Wikipedia or OSM. But not the other way around. Do we agree on this understanding? --[[User:Denny Vrandečić (WMDE)|Denny Vrandečić (WMDE)]] ([[User talk:Denny Vrandečić (WMDE)|talk]]) 12:39, 4 July 2012 (UTC)
This past U-turn has left a sour taste for me and permanently undermined trust. It's left me with a feeling that the WMF generally wants to push CC 0 as the superior licence and that any statements to the contrary are mere lip service. So right now, my suspicion is that whatever anyone here says – we'll end up with CC 0 for Abstract content and output content sooner or later. This will remove traceability of content to the Wikimedia movement, while re-users will be free to make any derivative content proprietary. I find neither desirable.
Therefore I would like anyone who does not want CC 0 to be applied to Abstract content and output content to be very clear about it here. Then at least it can't be said later on that the entire community agreed with a recommendation that said CC 0 might be the best way to go – which is what both the WMF legal department's and the development team's recommendations are currently saying. --Andreas JN466 12:22, 24 November 2021 (UTC)
- I agree that anyone who does not want CC 0 to be applied to Abstract Content to state so on this page. Legal says that CC 0 is an option, and recommends to use either CC 0 or CC BY-SA. The development team further narrows down the recommendation to be CC BY-SA. --DVrandecic (WMF) (talk) 22:25, 24 November 2021 (UTC)
- It doesn't narrow it down at all. Even now, after this edit, the development team section starts by saying the team recommends to follow the recommendations by Legal and ends with a list of advantages CC0 would have over CC BY-SA. --Andreas JN466 23:39, 24 November 2021 (UTC)
- You say it doesn't narrow it down. The text says, summarized: "Legal recommends two options, CC BY-SA and CC 0. We recommend CC BY-SA.". For me that reads a lot like narrowing it down. -- DVrandecic (WMF) (talk) 21:36, 25 November 2021 (UTC)
- You say, summarised, "We recommend CC BY-SA, but one could argue that CC0 is more future-proof, and it most certainly allows the most freedom." That's about as narrow as a barn door. --Andreas JN466 23:29, 25 November 2021 (UTC)
- You say it doesn't narrow it down. The text says, summarized: "Legal recommends two options, CC BY-SA and CC 0. We recommend CC BY-SA.". For me that reads a lot like narrowing it down. -- DVrandecic (WMF) (talk) 21:36, 25 November 2021 (UTC)
- It doesn't narrow it down at all. Even now, after this edit, the development team section starts by saying the team recommends to follow the recommendations by Legal and ends with a list of advantages CC0 would have over CC BY-SA. --Andreas JN466 23:39, 24 November 2021 (UTC)
Let's stay compatible, let's stay copyleft
I support a copyleft license for all code (GPL, very likely) and copyrightable elements (CC BY-SA), as it is more fair to volunteers and external contributors. Also, it's highly likely that, at least in the beginning, a prime use-case will be people moving content around between Wikipedia and Abstract Wiki. The licenses should not prevent us from doing that.--Strainu (talk) 19:35, 24 November 2021 (UTC)
- Thanks for the comment, Strainu! Just a quick note: your reasoning about Abstract Content and Wikipedia content to both be under the same license is sound. Regarding the code though, a copyleft license such as GPL would *not* allow us to move code from Wikifunctions to the Wikimedia projects. Just because both GPL and CC BY-SA are copyleft licenses does not mean that they are compatible and that code can be moved from one license to another. --DVrandecic (WMF) (talk) 00:28, 25 November 2021 (UTC)
- DVrandecic (WMF), what I meant was that code should be license-compatible with other code we use and content should be compatible with other content.--Strainu (talk) 14:50, 25 November 2021 (UTC)
Do we really have a choice?
(This refers to “Abstract Content” and “Output Content” as defined on the front page. “Function signatures” and “Function implementations” are ignored here. Disclaimer: IANAL.)
The notion of this discussion is that we can pretty much freely choose a license for Abstract Wikipedia, with the options being the license CC-by-sa and the license-waiver CC0. However, this is based on the premise that the works to be created are copyrightable at all—which I doubt to be the case. After reading the WMF position on copyright of AI/ML generated content, I can understand how one could argue that these works may be technically copyrightable under some conditions, but I am not sure whether this really translates 1:1 to this situation.
“Abstract content” is pretty much non-copyrightable factual data taken from CC0-Wikidata, but with an added copyrightable selection and order/arrangement. The latter two aspects seem theoretically copyrightable. However, a threshold of creativity needs to be passed here in order to have these copyrightable aspects in the “Abstract content”. I suspect that this will often not be the case, since both selection and order/arrangement seem to be pretty much pre-determined by data availability from Wikidata, and text render function availability from Wikifunctions; I also expect generation of “Abstract content” to be a highly schematic process (read: no creativity required), so that much of if will eventually be created with automated processes.
The textual form of the “Output content” will be pretty much determined by the text render functions used. For copyrighted text, it legally is usually not a problem to extract the information (pretty much its “abstract content”), and to rephrase it into another textual form containing the very same information in the very same order. When done properly, this does not infringe existing copyright of the original textual representation. I fail to see how this “Output content” should be copyrightable to an extent that users are effectively restricted to comply with CC-by-sa. I also do not understand how the authors of the “Abstract content” should be considered as authors here when the actual textual representation is much rather a result of work done by function authors at Wikifunctions.
In summary, choosing CC-by-sa might give us the illusion that the content here is copyrightable and protected under the terms of that license, when effectively this might not be the case for large and substantial amounts of the content if one was to litigate this in court. This choice woutl also make it difficult to reuse data, since the legal situation from a reuser perspective seems to be unnecessarily complicated—which often scares potential reusers away entirely.
I would also like to mention that this situation is not really new. People are still surprised that facts are being imported from CC-by-sa Wikipedias to CC0 Wikidata. However, the CC-by-sa license does not make this legally impossible at all. Another notable example of a problematic licensing situation is OSM with its OdbL copyleft license which tries to put up a fence around the project’s data, but fails to be an effective protection as desired by many OSM contributors. —MisterSynergy (talk) 01:14, 25 November 2021 (UTC)
- The WMF lawyers seem to have looked at this, and came up with CC BY-SA as an option. The ultimate output of Wikifunctions is text, much like Wikipedia. Why would CC BY-SA make reuse of these texts more onerous than use of Wikipedia texts? --Andreas JN466 01:34, 25 November 2021 (UTC)
- I am not sure how much WMF lawyers have actually looked at the details. It all seems to be based on this paper, but the outcome is not really transferred to this situation. IMO a key problem is that “Abstract content” seems to be only weakly copyrightable at best due to its nature as factual data and the pretty low level of creativity required to generate it.
CC-by-sa for potentially not copyrightable content would confuse reusers. Regular Wikipedia articles are overwhelmingly indeed copyrightable, thus CC-by-sa is a sane choice there. —MisterSynergy (talk) 10:29, 25 November 2021 (UTC)- MisterSynergy As I understand it, abstract content means you compose sentences much as you do in ordinary language use. You have a database of knowledge about the world and its constituents and their possible relationships, and words for them to enable you to talk about them. Abstract content translates these statements into a kind of machine language, so they can then be more easily translated into any human language, using a limited vocabulary, but you are still fundamentally writing text, expressing ideas, etc., just as you do when you write in a human language. --Andreas JN466 10:46, 25 November 2021 (UTC)
- Here we are getting to the actual point. I would like to hear from WMF/Denny whether this question of “Abstract content” copyrightability has been evaluated explicitly, since my impression of the nature of “Abstract content” is a much simpler one. —MisterSynergy (talk) 10:52, 25 November 2021 (UTC)
- Good point. But it's not just that, actually. There is also a language translation process involved before you get to an output sentence like "Jupiter is the largest planet of the solar system". Each and every type of abstract content has to be assigned a human-made or human-reviewed translation into every human language to be served. There will be innumerable challenges based on the intricacies of languages' different grammatical structures – differences in terms of how grammatical gender is or isn't distinguished, case endings, verb endings, ergative languages etc. – all of which will require output checking and fine-tuning by humans. Translation is a creative act, even when assisted by AI; translations are copyrightable. --Andreas JN466 11:04, 25 November 2021 (UTC)
- Yes, but all of this is in scope of Wikifunctions, thus the code license applies here. The process of “Abstract content” generation is really not much else than a selection of facts to be included in the output, from all the facts available in Wikidata, and also its arrangement in the “Output content”. This process might principally include some copyrightable aspects as described above, but I am pretty sure that it is extremely schematic and can probably be automated to a large extent, e.g. by using some sort of generic “Abstract content” templates that fit for all data items of a given type (e.g. humans, cities, companies, etc.).
Generally, the difficult parts of “Abstract Wikipedia” are the data and lexicographical repository Wikidata (excessive to compile, we’re already 9 years into this), and Wikifunctions (complicated because it needs to generate the text). Licensing of both is pretty much undisputed. Any user can come and use these tools to generate their own texts, without using “Abstract content” from Abstract Wikipedia, but using information from Wikidata only. In some sense, the “Abstract content” is simply a demonstration how Wikifunctions can be used to generate text for a given data item and users are free to alter the output as much as they like. We can of course not require them to make any textual output of Wikifunctions functions available under a specific license. —MisterSynergy (talk) 13:17, 25 November 2021 (UTC)- Have a look at Abstract_Wikipedia/Examples/Jupiter. This seems more than "extremely schematic" to me. I don't agree that the resulting text lacks any kind of creativity, or "Schöpfungshöhe", and is just like an alphabetical series of phone book entries that could not really have had any other meaningful form. --Andreas JN466 13:30, 25 November 2021 (UTC)
- The Development Team actually put that very nicely, when they say:
- Editors have a very fine-grained selection of which facts are being displayed and which are not. In Wikidata we strive for completeness over careful selection.
- Editors have a very fine-grained control of the order the facts are being displayed in, constituting narrative elements, which are not available in Wikidata.
- We expect the natural language generation to allow editors to express to some degree of emphasis and selection of wording.
- All of these point towards Abstract Content being more similar to text than to a collection of facts, and therefore we suggest that we follow the same license that we use for the text in Wikipedia, which is CC BY-SA.
- That Jupiter text is a good example of that. --Andreas JN466 23:55, 25 November 2021 (UTC)
- Yes, but all of this is in scope of Wikifunctions, thus the code license applies here. The process of “Abstract content” generation is really not much else than a selection of facts to be included in the output, from all the facts available in Wikidata, and also its arrangement in the “Output content”. This process might principally include some copyrightable aspects as described above, but I am pretty sure that it is extremely schematic and can probably be automated to a large extent, e.g. by using some sort of generic “Abstract content” templates that fit for all data items of a given type (e.g. humans, cities, companies, etc.).
- Good point. But it's not just that, actually. There is also a language translation process involved before you get to an output sentence like "Jupiter is the largest planet of the solar system". Each and every type of abstract content has to be assigned a human-made or human-reviewed translation into every human language to be served. There will be innumerable challenges based on the intricacies of languages' different grammatical structures – differences in terms of how grammatical gender is or isn't distinguished, case endings, verb endings, ergative languages etc. – all of which will require output checking and fine-tuning by humans. Translation is a creative act, even when assisted by AI; translations are copyrightable. --Andreas JN466 11:04, 25 November 2021 (UTC)
- Here we are getting to the actual point. I would like to hear from WMF/Denny whether this question of “Abstract content” copyrightability has been evaluated explicitly, since my impression of the nature of “Abstract content” is a much simpler one. —MisterSynergy (talk) 10:52, 25 November 2021 (UTC)
- MisterSynergy As I understand it, abstract content means you compose sentences much as you do in ordinary language use. You have a database of knowledge about the world and its constituents and their possible relationships, and words for them to enable you to talk about them. Abstract content translates these statements into a kind of machine language, so they can then be more easily translated into any human language, using a limited vocabulary, but you are still fundamentally writing text, expressing ideas, etc., just as you do when you write in a human language. --Andreas JN466 10:46, 25 November 2021 (UTC)
- I am not sure how much WMF lawyers have actually looked at the details. It all seems to be based on this paper, but the outcome is not really transferred to this situation. IMO a key problem is that “Abstract content” seems to be only weakly copyrightable at best due to its nature as factual data and the pretty low level of creativity required to generate it.
Recommended reading: Heather Ford, The Rise of the Underdog
Heather published this as a chapter in the recent MIT Press Wikipedia@20 book (eds. Joseph Reagle and Jackie Koerner, open access).
As argued in that chapter, there are consequences for both end users and volunteers when "Wikipedia’s facts are now increasingly extracted without credit by artificial intelligence processes that consume its knowledge and present it as objective fact." --Andreas JN466 13:19, 25 November 2021 (UTC)
- I asked at the talk page of Wikimedia Enterprise API if it will be possible to make a video conference with the customers of the Wikimedia Enterprise-API. This can be a chance to understand how they process the information of the Wikimedia projects currently. Telling the source of a statement is an important thing and should be from my point of view also done if it is not necessary because of the license and so there it is from my point of view a responsibilty of the public who consumes content to think about the source. The consumer should pay attention that a source is mentioned and talk to responsible people if it is not. This is from my understanding independent of the license of the content. As far as I know there is used a kind of text mining in some KnowledgeGraphs like the GoogleKnowledgeGraph. For that I am not sure if a CC-BY-SA license can stop them from doing that and so the information could come also to the users. There were discussions as far as I read about a artificial intelligence system called Copilot offered by GitHub to propose code snippets to users and it is based on a big collection of Code of Free Software Projects and Free Texts. I read at Netzpolitik.org an German article with meaning of the author about that and so I am not completely sure if the mentioned legal view is a wider consensus.--Hogü-456 (talk) 19:31, 25 November 2021 (UTC)
Question about dates
The content page (overleaf) says, If no rough consensus can be reached, we will organize a formal vote. The mailing list announcement says that you hope to have a consensus by 20 December.
Could you reassure us please, DVrandecic (WMF) and Quiddity (WMF), that this potentially quite far-reaching matter won't be decided on a vote held four days before Christmas? And that if there is no consensus discernible by Christmas, we will have that vote sometime in January 2022, once everybody is back at their desks?
Everybody who's intending to celebrate Christmas this year would appreciate that, I'm sure. Thanks --Andreas JN466 17:19, 25 November 2021 (UTC)
- Yes, of course. Plus I'm re-titling this section to avoid anyone else being confused by it. Quiddity (WMF) (talk) 20:59, 25 November 2021 (UTC)
CC-BY?
As Wikinews use CC-BY, it may be an idea to use a license compatible with it (as abstract contents can be used in any Wikimedia wikis in the future). One preferred version:
- For contents that only combines constructors (such as Abstract_Wikipedia/Examples/Jupiter), we license them as CC-BY.
- For contents containing transitional texts (i.e. texts that is translated though not in abstract content, see d:Template:TranslateThis), those transitional texts are still under CC-BY-SA. In many cases (abstract) articles will contain translatable text that is not yet in abstract form.