Talk:Abstract Wikipedia/Licensing discussion: Difference between revisions
GrounderUK (talk | contribs) →!Votes about the Licence for Abstract Content: re neither |
→!Votes about the Licence for Abstract Content: re, clarify |
||
Line 452: | Line 452: | ||
::{{re|DVrandecic (WMF)}} We agree on output content, but here I am referring only to abstract content for Abstract Wikipedia. Abstract content that is a derivative (“translation”) of public domain content can have CC BY-SA licensing only for the derivative. I oppose this because it applies restrictions on under-served language versions that do not apply to the well-served language version that uses (untranslated) content that is in the public domain. Of course, most Wikipedia articles are not wholly within the public domain, although they are required to indicate which parts are. Clearly, the BY-SA licensing does not (and cannot) apply to those parts. However, if we “translate” such an article into abstract content we would be compelled to license the whole “translation” as a BY-SA derivative. I propose, instead, to divide the original up according to its actually effective licensing and publish only the “translation” of BY-SA parts into abstract content as a BY-SA derivative. The “translation” of public domain parts of the original could then be published either as a BY-SA original or with a CC 0 waiver. I would oppose the BY-SA option (to repeat myself). Similar thinking might be applied to legitimate use of copyright material within the original article, but here I believe that we would not necessarily have the right to create the derivative. If we do, I would support BY-SA licensing for our “translation” of those parts.--[[User:GrounderUK|GrounderUK]] ([[User talk:GrounderUK|talk]]) 01:40, 7 December 2021 (UTC) |
::{{re|DVrandecic (WMF)}} We agree on output content, but here I am referring only to abstract content for Abstract Wikipedia. Abstract content that is a derivative (“translation”) of public domain content can have CC BY-SA licensing only for the derivative. I oppose this because it applies restrictions on under-served language versions that do not apply to the well-served language version that uses (untranslated) content that is in the public domain. Of course, most Wikipedia articles are not wholly within the public domain, although they are required to indicate which parts are. Clearly, the BY-SA licensing does not (and cannot) apply to those parts. However, if we “translate” such an article into abstract content we would be compelled to license the whole “translation” as a BY-SA derivative. I propose, instead, to divide the original up according to its actually effective licensing and publish only the “translation” of BY-SA parts into abstract content as a BY-SA derivative. The “translation” of public domain parts of the original could then be published either as a BY-SA original or with a CC 0 waiver. I would oppose the BY-SA option (to repeat myself). Similar thinking might be applied to legitimate use of copyright material within the original article, but here I believe that we would not necessarily have the right to create the derivative. If we do, I would support BY-SA licensing for our “translation” of those parts.--[[User:GrounderUK|GrounderUK]] ([[User talk:GrounderUK|talk]]) 01:40, 7 December 2021 (UTC) |
||
:::As has been clarified previously, this discussion is only about how the Abstract Content in *Abstract Wikipedia itself* should be licensed! If one specifically intends to create a "verbatim", literal translation of existing public domain content into Abstract Content, I agree that this should be CC0 licensed but it should also not be part of Abstract Wikipedia itself. Rather, it should probably be included in Wikidata. --[[User:Hupaleju|Hupaleju]] ([[User talk:Hupaleju|talk]]) 10:14, 7 December 2021 (UTC) |
|||
==== Discussion about the Licence for Abstract Content ==== |
==== Discussion about the Licence for Abstract Content ==== |
Revision as of 10:14, 7 December 2021
Comments are welcome in any language.
We would like to invite the community to discuss these recommendations and hopefully to find consensus around the licensing decision. The goal is to keep the discussion open for about four weeks, and, if necessary, to extend and restructure it. In case this turns out to be insufficient to reach a consensus, we might restructure the licensing discussion to focus solely on Wikifunctions for now, and then follow up with a discussion about Abstract Wikipedia.
To guide the license choice, it may be useful to consider and discuss the following questions: What are the long-term objectives of the projects, and how can a copyright license support these objectives? Should the people involved in creating Abstract Content receive credit? How valuable is it to preserve consistency and compatibility with the licenses on Wikipedia?
The specific question to the community is whether to follow the given recommendations, or whether a different proposal would be more to the liking of the community. Particular questions of interest surround the licensing of Function Implementations, and the licensing of Abstract Content (and thus Output Content).
Even if you are just supporting the recommendations above, it would be great to see your voice expressed explicitly, in order to get a better understanding of what the community tends towards. If no rough consensus can be reached, we will organize a formal vote.
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- The first ten days of discussion are frozen to focus on newer discussion and avoid confusion.
Please see the #Restructuring the discussion to move on section and new request for specific feedback below. Thanks. Quiddity (WMF) (talk) 18:51, 2 December 2021 (UTC)
- The first ten days of discussion are frozen to focus on newer discussion and avoid confusion.
Comments
Views on the recommendations
- … Support Thadguidry (talk) 22:17, 22 November 2021 (UTC)
- Thadguidry, could you please clarify whether you support CC0 for Abstract content, or CC BY-SA, or either? (As currently framed, the "recommendations" allow either, so your "support" vote could be interpreted as support for either.) --Andreas JN466 06:56, 24 November 2021 (UTC)
- Andreas, either --Thadguidry (talk) 16:56, 29 November 2021 (UTC)
- … Support Kasyap (talk) 06:06, 23 November 2021 (UTC)
- Kasyap, could you please clarify whether you support CC0 for Abstract content, or CC BY-SA, or either? (As currently framed, the "recommendations" allow either, so your "support" vote could be interpreted as support for either.) --Andreas JN466 06:56, 24 November 2021 (UTC)
- Support. Specifically in favor of CC0 for Abstract Wikipedia because it seems highly questionable to me whether most of its articles are copyrightable at all. —MisterSynergy (talk) 13:12, 24 November 2021 (UTC)
- Support CC0 for Function Signatures, Abstract content, and Output content. Maybe CC0 for Function implementation, to the extent that function implementation can itself be abstract content / output content of yet other functions. I am still thinking but I think this is my position. Blue Rasberry (talk) 15:34, 24 November 2021 (UTC)
- Support CC0 for function signatures, Apache for function implementations, CC BY-SA for both Abstract content and Output content.
- I am specifically convinced that, while the fact that Jupiter is a planet is not copyrightable, a construct that compares Jupiter to all other planets in the Solar System is sufficiently elaborate to make me think it would be more defendable with a CC BY-SA license, than a CC0. In this, I totally agree with the recommendations from the dev team. Please note that this, to me, is a non-negotiable baseline - I will support any proposal that grants more freedom than this (i.e. CC0 for abstract/output content), but will fight tooth and nail against any proposal that will grant less freedom than this. Sannita - not just another it.wiki sysop 20:01, 24 November 2021 (UTC)
- Oppose CC0 ist nicht viral und sollte deshalb insgesamt verboten werden. CC0 widerspricht den Grundgedanken freier Lizenzen, daß sie sich selbst verbreiten. Ralf Roletschek (talk) 09:49, 25 November 2021 (UTC)
- Oppose the use of CC0 for abstract content/output content. End users should always be able to see that the information comes from Wikimedia, and re-users should be forced to use free licences for derivative works. (See "The rise of the underdog" for a more detailed discussion.) / Dem Endnutzer sollte stets klar sein, dass die Information von Wikimedia kommt (Rückverfolgbarkeit), und Nachnutzer, die abgeleitete Werke erstellen, sollten gezwungen sein, ebenfalls eine freie Lizenz zu verwenden. Das ist bei CC0 nicht gegeben. --Andreas JN466 10:08, 25 November 2021 (UTC)
- The purpose of copyright is to protect your creative expression; it does not allow you to force users into exercising a particular data provenance protocol. —MisterSynergy (talk) 10:35, 25 November 2021 (UTC)
- CC BY-SA requires attribution. It's why there is a "Wikipedia" link in the Knowledge Graph panel, and why voice assistants tell you when they read out a Wikipedia article (if I recall correctly, it took some prodding from the WMF to make them do so). --Andreas JN466 10:53, 25 November 2021 (UTC)
- The purpose of copyright is to protect your creative expression; it does not allow you to force users into exercising a particular data provenance protocol. —MisterSynergy (talk) 10:35, 25 November 2021 (UTC)
- Support the combination of CC-0 (function signatures), Apache [or MIT or LGPL] (code), CC-BY-SA (abstract content and generated output); Oppose other variations. To preserve history of generated output, I would recommend a) adding a link to generated output that points to the abstract content used to generate it, b) clarifying in terms of use that for re-users, it is sufficient to link to the generated output alone.--Eloquence (talk) 20:26, 26 November 2021 (UTC)
- Support CC0 for function signatures, Apache for function implementation, CC-BY-SA for abstract content and output content. But I would appreciate if we could use MIT for function implementation as well... --Ameisenigel (talk) 16:33, 27 November 2021 (UTC)
- Choices are being made as soon as what is grammatically (im)possible is defined. e.g. generate text(pronoun, language) = text (to take a currently polemical example, which by no means captures the full scope of the problem). I agree with Andreas, Eloquence, and Ameisenigel above: CC0 should be rejected in order to require attribution for any abstract/output content derived from these processes. SashiRolls (talk) 21:40, 29 November 2021 (UTC)
- Support for CC0 for signatures (because they are more like a mathematical formula then a creative work) and Apache for code (which is one of widely-used licences in software). But when it's about a licence for the content, I Oppose CC0. I perceive the abstract content (as a whole) not as a set of pure information (which is the case with Wikidata contents) but as a prose written in a specific artificial language. In my opinion this would be like a declarative programming code, where there are also virtually only facts and declarations written. That's why I would like the authors of abstract articles be protected under copyright just as the authors of other language versions are. Msz2001 (talk) 19:00, 1 December 2021 (UTC)
Support but with modifications
- I support most of it, and I'm not necessarily asking for a modification. However, I'd love to see at least an explanation for the choice of a permissive license for the implementation. See a more detailed question below. --Amir E. Aharoni (talk) 12:26, 23 November 2021 (UTC)
- I support the output content being licensed CC BY-SA, as long as we have a description of how attribution will work *and* an assurance that the output content CC BY-SA can be maintained *in practice*. What I'm worried about is a situation such as the following:
- A big tech actor takes the abstract content and the "lower level" software without attribution and then runs the software with Wikidata and the Abstract Content to get the same Output Content the Wikipedia might use, without any attribution required. Thus Wikipedia would have to attribute, but big tech would not have to attribute the same content. Can you reassure us that such a situation would not arise, and if it did that WMF legal would go to court to enforce the CC BY-SA license? I'll note that there are lots of cases of Wikipedia content being used without attribution, but no legal action is taken to enforce the license. Would the same situation arise here in practice? Smallbones (talk) 15:12, 24 November 2021 (UTC)
- Regarding non-compliance with the license violations, I want to refer to this section on English Wikipedia. Given that, do you still hold your recommendation for CC BY-SA, or do you prefer another license? --DVrandecic (WMF) (talk) 21:29, 25 November 2021 (UTC)
- Well, I do know for a fact that the WMF spoke to Amazon about their lack of attribution in Alexa – with, I believe, a positive result in the end. Admittedly though not the same as going to court. --Andreas JN466 00:36, 26 November 2021 (UTC)
- Exactly. There are many options in such a case, and Smallbones' request to give assurances about going to court would limit the WMF's options considerably. --DVrandecic (WMF) (talk) 00:28, 30 November 2021 (UTC)
- Well, I do know for a fact that the WMF spoke to Amazon about their lack of attribution in Alexa – with, I believe, a positive result in the end. Admittedly though not the same as going to court. --Andreas JN466 00:36, 26 November 2021 (UTC)
- Regarding non-compliance with the license violations, I want to refer to this section on English Wikipedia. Given that, do you still hold your recommendation for CC BY-SA, or do you prefer another license? --DVrandecic (WMF) (talk) 21:29, 25 November 2021 (UTC)
- In general terms, I would adopt cc0 for data and functions, and cc-by-sa for the generated content. -Theklan (talk) 16:51, 24 November 2021 (UTC)
- Only support CC BY-SA for Abstract/output content, object to CC0 for Abstract/output content. Would prefer GPL to Apache. A key point to me is that re-users should have to indicate provenance. If lots of voice assistants, info panels etc. all end up agreeing with each other, end users should be able to see that this is because they all draw on the same Wikimedia source, and not because the entire world is in agreement. The other key point is that re-users should be forced to use the same licence for derivative works rather than converting free (gratis and libre) volunteer work into proprietary works. --Andreas JN466 18:29, 24 November 2021 (UTC)
- Support CC BY-SA and GPL licenses, as appropriate. I wrote my reasoning in the section below.--Strainu (talk) 19:40, 24 November 2021 (UTC)
Questions and clarifications
I hardly know what this discussion means.
- If this same content were coming from the most closed and restricted corporate actor, then what licenses would be applied to the various components? I want to know the precedent in the field.
- Can you show a third-party non-wiki example of these components in action anywhere, and also describe licenses in place for the parts
- What professional field produces this kind of content? Is this software, math notation, creative text, or what?
- Do different components routinely have different licenses?
- For which components, if any, are there past claims that they are uncopyrightable?
- Is this more of a casual wiki licensing decision, or is this a situation where the Wikipedia platform is an Internet giant and what we decide may potential influence a generation of development in a field?
- "Wikimedia Foundation's submission in response, we explained that AI algorithms should be treated like any other software tool and that the tool's user should be considered the copyright holder" - link to that please. That seems ethical and social position which needs to come from organized and informed Wikimedia community discussion, and not from a legal opinion. In the Wikipedia ecosystem a claim like that seems like it should come from discourse rather than from an office or role of designated authority. I am not aware of any past such community conversation.
Thanks. Blue Rasberry (talk) 00:07, 23 November 2021 (UTC)
- Hi, I will try my best to answer the questions, but note that I am not a lawyer. Also, in order to speed up the communication, I did not go through checking my answers by multiple departments etc., so please bear with me, I might make mistakes, have forgotten some things, and might not be aware of everything.
- I am not sure what you mean with precedent in this field, sorry. To the best of my knowledge we are the first large scale system aiming to allow a community to collaborate on and create multilingual content and generate language using a symbolic approach to make this content available to many. Other systems have been doing some parts of these, such as the Canadian weather service which has been using NLG technology for a long time in order to publish weather reports, etc. I do not know what kind of licenses they have used for their components. If I were to imagine the most closed and restricted actor, I would easily imagine that they don't publish the code at all, that they don't publish the abstract content at all, and that they merely publish the resulting natural language text under the normal rules of copyright. I hope that answers the question.
- No, I am not aware of any such system.
- I am not aware of any professional field producing text in many languages using a similar system.
- Yes. Think about Wikimedia in general: Wikipedia's text is published under CC BY-SA, Wikidata under CC 0, our software under various licenses (e.g. MediaWiki under GPL), and the images on Commons are also using a large number of licenses.
- The text by the Legal department discusses this, in particular for facts and for APIs.
- It is hard to predict the future. Wikidata's licensing decision has been used in several external places as an argument in favor of using CC 0 in other places too. Whether the decision for licensing the components of Wikifunctions and Abstract Wikipedia will have a similar effect or not remains to be seen.
- Here is the link to the document which has also been added into the project page by Quiddity.
- Thanks for asking! DVrandecic (WMF) (talk) 22:23, 23 November 2021 (UTC)
- @DVrandecic (WMF): Thank you for the decisive answers. I now have a better understanding that the choices we have to make are new. I was missing that context. This is helpful, I will think more. Blue Rasberry (talk) 22:40, 23 November 2021 (UTC)
I see what the function signatures and implementations may look like. And agree that CC0 and Apache License respectively are reasonable. But I have no idea how exactly the abstract content is going to look like. I'm quite sure that the example with Jupiter being the largest planet is example of a bare information that is not copyrightable. But the full article will certainly be more complex and I wonder if the arrangement of facts and their selection is creative enough to be copyrighted. So how do you imagine the abstract content to look like? And how the information produced by each statement is going to be joined to produce readable text (something more complex that first-grader's It's a pencil. It's blue. I like it. I use it very often)? Msz2001 (talk) 13:45, 1 December 2021 (UTC)
- @Msz2001: The Jupiter example is here: Abstract_Wikipedia/Examples/Jupiter. The text is basically the CC BY-SA text of the Simple English Wikipedia article. It reads as follows:
- Jupiter is the largest planet in the Solar System. It is the fifth planet from the Sun. Jupiter is a gas giant, both because it is so large and made up of gas. The other gas giants are Saturn, Uranus, and Neptune.
- Jupiter has a mass of 1.8986×1027 kg (318 earths). This is more than twice the mass of all the other planets in the Solar System put together.
- Jupiter can be seen even without using a telescope. The ancient Romans named the planet after their god Jupiter (Latin: Iuppiter). Jupiter is the third brightest object in the night sky. Only the Earth's moon and Venus are brighter.
- Jupiter has 79 moons. Of these, around 50 are very small and less than five kilometres wide. The four largest moons of Jupiter are Io, Europa, Ganymede, and Callisto. They are called the Galilean moons, because Galileo Galilei discovered them. Ganymede is the largest moon in the Solar System. It is larger in diameter than Mercury. In 2018 another ten very small moons were discovered.
- Recreating that text in Abstract Wikipedia and licensing it under CC-0 would be a blatant infringement of the Simple English Wikipedia's CC BY-SA licensing terms. --Andreas JN466 18:25, 1 December 2021 (UTC)
- Many thanks for your answer. I see now how it's going to look like and I'm against the CC0 too. Msz2001 (talk) 18:29, 1 December 2021 (UTC)
Discussions
- How big of a decision is this, and how much can our decision affect future Wikimedia activities, organization, and expenses? If this is a relatively small Wikimedia community decision, then a typical talk page chat like this one is sufficient. If there are big issues here then perhaps we have a bigger conversation. I do not have a good understanding of the potential impact of the answers we have here, but I do have some intuition that the decisions made here will have effects at scale, and potentially apply licenses to much or almost all of the Wikimedia Movement's future output. I neither want to be hasty in setting standards, nor do I want to delay this discussion too much. If this really is a big decision, I would like diverse participation in the discussion. The Wikimedia Foundation is investing in this tool and while I trust the WMF for what it is, that organization does not reflect the diversity of our movement, and also it has a tendency to adopt practices which align with the commercial tech industry when there are other options. Again, I lack understanding of how important the decision here is, but I do feel hesitation when a big decision is put on the table and I cannot recognize who would benefit from the various choices that can be made. I think there is not supposed to be a choice here at all, but rather just a description of current perspectives, but we already know that there is a lot of diversity in licensing beliefs, laws, and actual user practices. Those things already are not aligned with each other.
- I would be more happy here the more that I thought that the outcome of this discussion had informed Wikimedia community participation within it.
- Big thanks to everyone who put this discussion together. This is a great topic to discuss. Blue Rasberry (talk) 22:57, 23 November 2021 (UTC)
- @Bluerasberry: I agree, I would love to see wide participation when discussing this question. Then again, I also understand that the discussion is abstract and about a difficult topic with many subtleties. I don't think it is easy to know and understand all the ramifications of the different decisions. I don't claim to do so myself, which is why we are having this conversation, to make the best possible decision.
- You write that a bigger issue needs a bigger conversation than a talk page. What do you mean with that? We have traditionally had very big decisions on talk pages, and I thought that's one of our strengths.
- On the one side, I do not think that the impact of the licensing decision for the code in Wikifunctions is as big as you describe: it would not apply licenses to all of the movement's future output. Function application to content from other sources, in general, would and should have no effect on the license of the output. If you put in a copyrighted text into a "capitalize" function, the result remains copyrighted text - the license of the code for this function that was applied did not make any difference.
- On the other side, the decision about the license of Abstract Content might have an impact on Wikipedia and how the content of Wikipedia can be reused. For example, if we choose a CC 0 license for Abstract Content we would now offer an alternative text to the handwritten text, without the restrictions of CC BY-SA.
- Ultimately, we believe that the copyright licenses used by the Wikimedia projects should empower as many users as possible to easily collaborate by sharing knowledge (see also the legal note on the use of CC ).
- It would be extremely helpful for me to understand which questions in particular you have. Then I can try to give some thoughts about them. Also, the page also offers some questions which might help us guide this discussion, in case the big picture gets too big. Thanks for clarifying! --DVrandecic (WMF) (talk) 00:13, 24 November 2021 (UTC)
- I do not know what I think, except that I am unsure how big of a decision this is. I am going to take a break from this conversation after making this post, but thanks if you have any brief response.
- Denny: you say "-the licensing decision for the code in Wikifunctions ... not apply licenses to all of the movement's future output", and also " if we choose a CC 0 license for Abstract Content we would now offer an alternative text to the handwritten text, without the restrictions of CC BY-SA".
- To me, this sounds like using data science to generate content for humans to read. I think this has the potential to quickly account for 50%+ of Wikimedia content, and possibly ~99%+. The idea is to convert Wikipedia's content to uncopyrightable statements, then use some algorithm to present the facts in every language for every reader.
- Denny, you say "I do not think that the impact of the licensing decision for the code in Wikifunctions is as big as you describe".
- If we have this process for extracting meaning from copyrighted Wikipedia text and converting it to uncopyrightable info, then when does our process get compared with anyone else doing the same thing to other media, like news, journalism, research, or novels? Can the same process convert scramble a Harry Potter book into a public domain approximation with renamed characters and changed situations? What will Disney's lawyers say when someone copies the Wikimedia community process and does approximate movie remakes of their content? I know there are lots of experiments out there with students and hobbyists, but when Wikipedia takes a position on this, then that seems like a milestone.
- Whatever the outcome of this, I want it to advocate for the community. The Wikimedia community will choose the option which advances the Wikimedia Mission. To make an informed choice, I wish that I better understood what corporate interests want here. You seem to be saying that corporate advocates have not yet taken a position. If corporations have, then I do not want anyone normalizing their position yet before the Wikimedia community has its own independent discussion on the matter.
- Questions:
- What percentage of Wikimedia readers will have a significantly different user experience as a result of this decision? I think the answer is ~100%
- What percentage of Wikimedia content will be modified by this process? I think the answer is ~100%
- Who is going to talk about what the Wikimedia community decides here? A lot of people, including outside the wiki community
- I will answer the questions on the main page with my opinion and position.
- What are the long-term objectives of the projects, and how can a copyright license support these objectives?
- The goal is a next-generation increase in the amount, quality, and production speed of Wikimedia content. Copyright licenses are inherently designed for the benefit of corporations and commercialization, and only the unusual activist licenses like those of Creative Commons can favor communities and public benefit. Almost no Wikimedia contributors make actionable claims on copyright with their contributions, and almost all Wikimedia community members would like to see increased use of free and open licensing if not government regulation compelling less corporate copyright protection for global community benefit. There are years of precedent of license-violating re-use of user contributions to Wikipedia, and none among the Wikimedia Foundation, Creative Commons, or existing community organizations have ever been able to defend content creators effectively from inappropriate content use through the Wikimedia platform. For the technical components which this project describes, unchecked inappropriate reuse is even easier and more likely. Copyright protection will not favor Wikimedia community members in this case but will favor corporate actors. I do not wish to adopt the same idealistic myths of user protection for functions and their output, because those myths favor corporations and not the community.
- Should the people involved in creating Abstract Content receive credit?
- No. When communities enter the copyright game they cannot compete with corporate players and commercialization, and business interests will overwhelm community contributions. The precedent to make here is not an internal Wikimedia community decision, but the demonstration that if Wikimedia projects take a position then that position is a societal norm which should apply even outside of the Wikimedia platform. Sooner or later corporations are going to take a position on this issue, and when they start suing each other and the general public, the courts are going to ask what the precedent is. If Wikimedia projects have already established a precedent which normalizes advocating for community and public benefit, then corporate interests will have much more difficulty in claiming that other positions are natural. The community has a lot to gain by normalizing its own claims and practices that AI generated media and as much of the related machinery as possible is ineligible for copyright. These claims are reasonable and sensible as well, even though soon corporate propaganda will claim otherwise in an attempt to capture and privatize the public commons.
- How valuable is it to preserve consistency and compatibility with the licenses on Wikipedia?
- It is only valuable if that past consistency supports Wikimedia community plans for the future.
- When Wikipedia was established the conversation on licensing was less developed. Activists are more experienced now, understand free and open media better, and were under less corporate competition. The discussion now matters more and is more informed than past discussions.
- Licenses with Wikipedia have changed in the past. Originally Wikipedia used GNU Public Document Licenses for copyright. Somewhere along the way, the Wikimedia Foundation changed the copyright text from saying that Wikipedia text was available with either CC licenses or GPDL to "Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply". We used to say that we had to mention the GDPL as terms of the license, but now that license commitment is fulfilled by saying "additional terms may apply". The social context for change was that some people wanted to distance themselves from the GPDL for whatever reasons. This did not happen with a broad discussion or published legal opinions. It just happened because enough people wanted it and no one objected. Likewise, Abstract Wikipedia can do new things and break precedent also. A similar conversation is happening with a transition from CC 3.0 to 4.0; the 3.0 license does not describe updates while the 4.0 license does, and there is discussion about transitioning licenses somehow through some process, converting some 3.0 licenses to 4.0. Somehow interpretations change with culture and time and what seemed impossible can just change.
- Rules of Wikimedia projects exist to serve the community. The community does not exist to serve the rules. When making big decisions, we do not need to assume that all precedents are a divine consensus from infallible early commentators. There are some guiding principles we carry forth, like the fifth pillar which says "ignore all rules", and which we can apply to say that if the community wants a change then we can enact it.
- What are the long-term objectives of the projects, and how can a copyright license support these objectives?
- thanks Blue Rasberry (talk) 15:30, 24 November 2021 (UTC)
Alternate proposals
- Both CC0 and CC-BY-SA SHOULD be supported for Abstract Content - Regardless Wikidata material about real-world entities, languages etc. MUST stay CC0 only
- It SHOULD be possible to publish Abstract Content and Output Content under either of CC0 and CC-BY-SA, depending on what the user chooses. If CC0 licensing is supported, content that is so licensed can and SHOULD be stored in Wikidata (subject to agreement by that project). One could expect that this would mostly involve either statements of fact that are not feasible for the current ("flat", non-compositional) Wikidata model, but where the choice of some peculiar expression or presentation is largely immaterial, or literal, verbatim "translations" of claims, quotations, texts etc. from pre-existing sources in the public domain.
- A dedicated Abstract Wikipedia site, probably loosely comparable to the current Wikimedia Commons, would then be used to work on CC-BY-SA content, including content that's imported from other Wikimedia projects. Regardless, CC-BY-SA -licensed formal content about real-world entities (including lexical and/or grammatical data about languages) MUST NOT be published on Wikidata and/or commingled with the existing CC0 material - this would make Wikidata licensing legally uncertain which would destroy much of its current usefulness to outside reusers! Hupaleju (talk)
- If non-CC0 content is to be mixed together with CC-BY-SA content on Wikidata, it would indeed need to be very clearly separated, also technically. Agreed.
- Content that is either CC0 or CC BY-SA (or in fact, under any other or even no license) can be used with functions on Wikifunctions. Abstract Content can be published under a multitude of licenses. Agreed.
- One of the questions is, what should the Abstract Content of Abstract Wikipedia, that will be used in the Wikipedias, be published under? I am not in favor of having two different licenses for that, and I understand that is your proposal? --DVrandecic (WMF) (talk) 01:26, 30 November 2021 (UTC)
- To clarify my POV explicitly, in my view, if non-CC0 content-- including CC-BY-SA! -- is to be mixed together with existing Wikidata material, this ought not to occur "on" Wikidata at all but on a separate site with a distinct project identity. IOW, "Wikidata" itself ought to ensure that all of *its* material relating to the real world is still CC0 licensed, regardless of how it's encoded. Anything else would introduce confusion and serious legal uncertainty for reusers. A separate Wikimedia site could nonetheless freely *import* material at will from Wikidata itself, and new Wikibase federation mechanisms make this quite technically feasible.
- As to the matter of "what should the Abstract Content of Abstract Wikipedia, that will be used in the Wikipedias, be published under", the obvious answer to me, given that such Abstract Content will generally involve creative, copyrightable expression and presentation (plus of course the fact that much content will need to be imported from existing Wikipedias) is CC-BY-SA.
- However, in my view, it would be quite desirable to allow for CC0 'Abstract Content' for other purposes -- perhaps most obviously the previously-discussed "Abstract Descriptions" for Wikidata entities, but also allowing for enhancements to the data model of Wikidata itself. The key difference however is that such claims, while somewhat more complex than what Wikidata allows at present, will nonetheless be a matter of "sweat of the brow" effort to make facts about the real world machine-understandable under the *model* of Abstract Content, as opposed to arranging content in a highly original and expressive presentation to create compelling, high-quality encyclopedic text across languages. This is what justifies having two licenses for two radically different projects. --Hupaleju (talk) 14:03, 30 November 2021 (UTC)
Objections to the recommendations
- Object to CC0 for Abstract content/text output. A key point to me is that re-users should have to indicate provenance. If lots of voice assistants, info panels etc. all end up agreeing with each other, end users should be able to see that this is because they all draw on the same Wikimedia source, and not because the entire world is in agreement. Another key point to me is that re-users should be forced to use the same licence for derivative works rather than converting free (gratis and libre) volunteer work into proprietary works. --Andreas JN466 18:43, 24 November 2021 (UTC)
- I object to releasing code under the Apache License v2.0. There is a significant loss of value to the Wikimedia movement in generating functions that can be immediately proprietarized by someone else. A copyleft license like GPL should be used instead for all the same reasons we use CC-BY-SA as the default license for text. Furthermore, using the GPL is advantageous in that code which is licensed under any compatible license may be used under the GPL - this means we can reuse MIT, Apache 2, and plenty of other licensed-code if the code license is GPLv3 or later. That's a huge starting advantage for a project like this. Legoktm (talk) 06:28, 28 November 2021 (UTC)
- I would also like to object to Apache 2.0 as the initial default per Legoktm, although I am not opposed to allowing code to be licensed under Apache 2.0 so long as the GPL or a compatible copyleft license is the default provided in the interface when adding new code. Mahir256 (talk) 03:23, 29 November 2021 (UTC)
FAQs - Questions and answers
Some questions that showed up during the internal discussion and their answers.
- From a licensing perspective, does text generated with AW differ from text directly contributed by the community?
- That's exactly the question we have to answer with this discussion. Directly contributed text on the Wikipedias is published under a CC BY-SA license. Legal recommends that CC BY-SA would be an option for the Abstract Content and thus for the Output Content generated from the Abstract Content, so it may be the same if we decide so. But it could also be a different license, in particular CC0 or CC BY.
- Is there any way to ensure that a given language's community has the final word on what makes it into an article?
- Yes! Albeit not legally but technically: every Wikipedia language community will always have the opportunity to locally override Output Content coming from Abstract Wikipedia (or even to not enable it in the first place). The local communities retain the final decisions on the Wikipedia articles.
- Will the list of contributors to an Abstract Wikipedia article be accessible via a "History" tab within each Wikipedia page where it is used?
- It depends. There are (at least) two ways to create articles from Abstract Wikipedia: either by having individual function calls in the body of the article in the language Wikipedia, or by having the whole article created from the abstract content in Abstract Wikipedia. Regarding the first way, see below. In the second case: yes, we can link to the history of the abstract content easily.
- How will the "History" tab work if there is partial AW-content & partial local-content?
- The History tab does not currently include changes to imported content, be it through changes in templates, transcluded pages, media in Commons, etc. So if we wanted to do that that would be a bigger change throughout the ecosystem. Recent Changes though could easily integrate such changes.
- What about via the API if someone is crawling or spidering Wikipedia?
- For content or history? For history, see above - we usually do not do that currently with any form of included content. For content, depends on the exact API, whether one is asking for the raw content (in which case, no), or for the rendered content (in which case, yes).
For Function Implementation, consider following Wikimedia Commons
Hi, with the "Function Implementation" section, I'd recommend taking a similar approach to Commons - see commons:Commons:Licensing - i.e., allow all free licenses, as long as they meet certain requirements. Commons isn't mentioned in this section, so I'm not sure if this has been overlooked or considered but dismissed, but this seems to be a good way to store multiple smaller works, since it maximises the number that can be stored without causing license problems. Probably there are issues with combining code with multiple licenses together, but that same thing happens when you create a compilation of Commons media, so presumably it's manageable. If there are specific issues with individual licenses (e.g., GFDL for images), then add exceptions for them like commons:Commons:Licensing#GNU_Free_Documentation_License. Moreover, have a community process that lets you decide which licenses are acceptable - including new licenses that might appear in the future. Thanks. Mike Peel (talk) 07:52, 23 November 2021 (UTC)
- Thanks for the comment, Mike! Legal indeed outlined that possibility, and we are thinking about it. My current thinking is to launch with only a single license, and then see how things work out - and then, once we have more data and can see whether there is a need for other licenses, to implement support for multiple licenses if that would help the project. But I would like to hear arguments for and against using multiple licenses, and also if that's a requirement pre-launch or if that can wait. --DVrandecic (WMF) (talk) 22:40, 23 November 2021 (UTC)
Why permissive and not copyleft?
Most of the recommendation make sense to me, but why is the Apache license suggested for the implementation and not a copyleft license, such as GPL or AGPL? Why is it good for the license to be permissive? Won't there be advantages for using a stricter copyleft license to protect the usage?
AGPL, in particular, was in the news recently. I'm not a lawyer, and I'm not sure whether such a thing is useful for WikiFunctions, but my intuition tells me that it's a good thing to consider.
At the very least, the choice of a permissive license must be substantiated and not just declared. Amir E. Aharoni (talk) 12:24, 23 November 2021 (UTC)
- Hi Amir! In the section on Software, legal explains that their recommendation follows current best practices for our software development. Most of our software work is published under less restrictive licenses than the GPL, with some notable exceptions (such as MediaWiki itself). The recommendation is to follow our current best practices. Note that this is not a declaration, but a recommendation with a discussion, which is what we are doing here right now. --DVrandecic (WMF) (talk) 22:40, 23 November 2021 (UTC)
- MediaWiki itself is kind of a big thing. Most MediaWiki extensions are GPL, too (although notably, WikiLambda is MIT; why?). The content of our projects is CC-BY-SA, which is also copyleft.
- The reason for publishing the iOS app under a non-copyleft license is Apple's App Store policies, which are not friendly to copyleft, but the iOS app is far from being the most important thing in our universe, and it doesn't represent all the best practices.
- So just saying that permissive licenses are our current "best practices" is quite unclear, and it's not a good enough explanation. Amir E. Aharoni (talk) 06:14, 25 November 2021 (UTC)
- @DVrandecic (WMF): I do not believe "Most of our software work is published under less restrictive licenses than the GPL" is correct. I believe the opposite, that most Wikimedia code is under a copyleft license like the GPL. As Amir says, saying MediaWiki is an exception is like saying most of the internet is under a restrictive license and Wikipedia is an exception, so we should use a restrictive license. In this analogy, what we care about is Wikipedia and MediaWiki - the other stuff is mostly tangential.
- Furthermore, WMF employment contracts designate GPLv2 or later as the "typical license" that code is released under. Code posted Phabricator defaults to the GPL unless otherwise specified. I don't understand why we wouldn't continue using copyleft here. Legoktm (talk) 06:18, 28 November 2021 (UTC)
- @Legoktm: Sidenote: I'm going to revert your edits in the project page, because that part of the text was supplied by the legal team. However, we'll ask them to take a look at the suggested changes. Thanks for the input here and there! Quiddity (WMF) (talk) 23:47, 29 November 2021 (UTC)
- @Quiddity (WMF): ack. Would it be ok to add a "[disputed]" or similar tag and a pointer to this discussion then? Legoktm (talk) 02:13, 30 November 2021 (UTC)
- @Legoktm: Now re-added, edit approved. :) Quiddity (WMF) (talk) 17:49, 30 November 2021 (UTC)
- @Quiddity (WMF): ack. Would it be ok to add a "[disputed]" or similar tag and a pointer to this discussion then? Legoktm (talk) 02:13, 30 November 2021 (UTC)
- @Legoktm: Sidenote: I'm going to revert your edits in the project page, because that part of the text was supplied by the legal team. However, we'll ask them to take a look at the suggested changes. Thanks for the input here and there! Quiddity (WMF) (talk) 23:47, 29 November 2021 (UTC)
- I would also find it more consistent to have a copyleft license for the function implementations. I understand from the previous answer that most of software work of the Wikimedia Foundation uses permissive licenses, but I'd argue MediaWiki itself is the exception that counts, and, most importantly, all the contributed content is currently copyleft. This means that the information remains free, even taken outside of Wikipedia and modified; and this is desirable. I don't think the function implementations should be handled differently. A copyfree license would mean that someone could modify the content, modify the function implementations, not provide the new implementation, effectively making the new content potentially very hard to use. I think the ideal license would be a dual (A)GPL + CC-BY-SA license, so we are sure about the compatibility, and we still allow re-using the functions for other things outside Wikipedia (maybe translated into another programming language), while guaranteeing that the modified functions will still be available. Copyleft is how Linux and many projects have thrived, including Wikipedia itself. I'm afraid a copyfree license for parts of the contributed content could weaken Wikipedia. The other licensing decisions otherwise look good to me --Raphael.jakse (talk) 07:33, 24 November 2021 (UTC)
License for other templates, modules, and gadgets
It's not exactly on the topic of WikiFunctions, but it's strongly related: Now that we're finally discussing the WikiFunctions license, can this be also an opportunity to discuss the license for the hundreds of thousands of templates, modules, and gadgets on the sites? They are much more similar to code than to content, but by default the same content license applies to them. Is it good? Perhaps we could give the editors a choice to apply GPL, Apache, BSD, CC0, CC-BY-SA or some other license to the templates, modules, and gadgets they write? It's probably possible to do it using code comments, edit summaries, or talk pages, but it would be nice to have something more structured, for example a dropdown next to the Publish button. Amir E. Aharoni (talk) 12:30, 23 November 2021 (UTC)
- Hi Amir! I think it would be great to have a discussion about the licensing of the Lua code published on the various projects, of the gadgets, and all the other code that is all over the projects. Personally, I am not sure whether CC BY-SA is a great choice for that, as it is a license that is not meant for source code. But I think that the discussion we are aiming for here right now is already complex enough as it is, and would invite that question to be tackled independently as I don’t see much of a dependency between these questions. --DVrandecic (WMF) (talk) 22:43, 23 November 2021 (UTC)
- I've replied at User_talk:Amire80#License_for_other_templates,_modules,_and_gadgets since it seems off-topic for this page. Legoktm (talk) 06:42, 28 November 2021 (UTC)
Please clarify which recommendation for Abstract content you would like people to support
The "overview and request" section says "We recommend to publish Abstract Content under the same license as the content of Wikipedia, i.e. CC BY-SA", but the two sections further down detailing the actual recommendations by the legal department and the development team say either CC0 or CC BY-SA might be good. The overview needs to be made consistent with the recommendations somehow, otherwise it won't be clear what people supported or commented on. --Andreas JN466 07:38, 24 November 2021 (UTC)
- I've edited the "Overview and request" section to make it consistent with the actual recommendations. See [1] --Andreas JN466 20:51, 24 November 2021 (UTC)
- Hi Andreas! I am sorry you found the text confusing. The overview stated "We recommend to publish Abstract Content under the same license as the content of Wikipedia, i.e. CC BY-SA." until you changed it. The recommendation by the development team also states "The development team further recommends to choose CC BY-SA for Abstract Content and Output Content.", and then follows up with a few paragraphs of text explaining why we prefer CC BY-SA. I have re-read the text, and whereas you claim that the development team recommends CC-0 for Abstract Content, I can not find such a sentence (and it would be wrong). Please let me know what led you to that reading so we can fix it. Because of that I am reverting your change to the summary. --DVrandecic (WMF) (talk) 22:19, 24 November 2021 (UTC)
- @DVrandecic (WMF): Well, then you won't revert this edit, correct? Because that literally said that the team recommends "to choose either CC0 or CC BY-SA for the Abstract Content and the Output Content." And if the wording after this edit of mine now indeed reflects your recommendation – which I would welcome – then you must also say that you differ from the recommendation of the legal department, because the legal department's recommendation is, and I quote, "Recommendation: Abstract content should be licensed under CC BY-SA or CC0."
- So, do you agree with the legal department that CC0 would be good, or do you disagree and recommend CC BY-SA only? Please be clear. Right now you're hedging, because the last paragraph of the team's recommendations brings in CC0 through the backdoor, where you say, On the other hand, one could argue that by putting Abstract Content under CC0 we open the space for a larger amount of possible reuse in applications we cannot even imagine yet, nevermind properly decide the legal framework for. CC0 most certainly allows the most freedom in the reuse of the Abstract Content. So right now, the section is written in such a way that agreeing with you means agreeing that abstract content "should be CC BY-SA and not CC0, but could also be CC0". In other words, agreeing with you right now, as the section stands, gives you carte blanche to do whatever you like later on, and claim you had community support for your position. Regards, --Andreas JN466 22:44, 24 November 2021 (UTC)
- I am sorry that the development team's recommendation is still confusing to you. The first paragraph says that the development team agrees with the recommendation by legal, and then reiterates said recommendation - which is that the Abstract Content should be published under CC 0 or CC BY-SA.
- The third paragraph then reads "The development team further recommends to choose CC BY-SA for Abstract Content and Output Content." That does not mean we differ from the recommendation by legal, but we follow it and we further specify the option CC BY-SA over the option CC 0 (because legal's recommendation says to choose either one or the other). But I consider that recommendation to be fully aligned with legal's recommendation. We are just more specific by choosing one of the two options.
- Your edit though makes it sound as if the legal team has only recommended to use CC BY-SA, which it has not. It said that it is one of two options. I am afraid your edit is misrepresenting the recommendation by the legal team. I am rewriting that to make it clearer that this is the summary of legal.
- Since you are asking explicitly: I agree with the legal department that CC 0 would be a good option. I also think that CC BY-SA would be an even better option.
- I am not sure how to state it more explicit which license the development team recommends for Abstract Content but with the sentence "The development team ... recommends to choose CC BY-SA for Abstract Content ..." and to have that also in the overview stated as "We recommend to publish Abstract Content under ... CC BY-SA.". I think both sentences are rather clear. I wonder if anyone else is also confused about what our recommendation is.
- And yes, we list a few arguments for both licenses, in order to get the ball rolling. I think that both licenses would have interesting consequences, and we will implement either of these licenses, whatever the community chooses. --DVrandecic (WMF) (talk) 21:32, 25 November 2021 (UTC)
- Well, that wording is better. You now make clear in the first paragraph that you are summarising Legal's recommendation, whereas previously you presented what Legal said as (also) the development team's own recommendation. You said, right in the first paragraph:
- The development team recommends to follow the recommendations by Legal, i.e. to choose CC0 as the license for Function Signatures; Apache for Function Implementations (and to start with a single license, and only when we recognize the need for multiple licenses to extend Wikifunctions to support multiple licenses); to choose either CC0 or CC BY-SA for the Abstract Content and the Output Content.
- If you still do not see the problem, could you ask someone outside WMF to read that sentence and then tell you what they understand the dev team's recommendation for abstract content is, based on that sentence?
- And even though the wording is now better – thank you for that! – there is still a problem. What I have objected to, from the beginning, is that from your "Overview and request" it is not even apparent to the casual bystander that CC0 is an option here – an option that is prominently featured in both your recommendation and even more so in Legal's recommendation. As I've said before, when I first read the overview, I thought, "There is nothing to see here. CC BY-SA – yay!" And then, when I read the rest of the page the next day, incl. that sentence above, I could hardly believe my eyes. I wonder how many people have read your overview since your announcement and stopped there, thinking, as I did at first, that there is no controversy here, no need to even get involved in the discussion. (In fact, people on Facebook did say exactly that.)
- Are you okay with people being misled like that by your overview? Is it your intention to keep people like Ralf above, who have a fundamental issue with CC0, away from a discussion which might result in CC0 being adopted? If not, then I'd ask that you kindly rework your overview accordingly, so people get an accurate idea of the discussion's scope.
- Incidentally, it would also help to clarify who "We" is in the "Overview and request", given that you have recommendations from both Legal and the development team on the page. By the same token, it is not clear what "these recommendations" in the "Overview and request" actually refers to. Does it refer to the preceding paragraph? Does it refer to the recommendations from both Legal and the development team given further below? --Andreas JN466 23:29, 25 November 2021 (UTC)
- I have clarified the the "we" and the "these recommendations" in the "Overview and request". I hope this is now better. -- DVrandecic (WMF) (talk) 00:43, 30 November 2021 (UTC)
- Yes, much better. Thank you for the updates. --Andreas JN466 00:00, 1 December 2021 (UTC)
- I have clarified the the "we" and the "these recommendations" in the "Overview and request". I hope this is now better. -- DVrandecic (WMF) (talk) 00:43, 30 November 2021 (UTC)
- Well, that wording is better. You now make clear in the first paragraph that you are summarising Legal's recommendation, whereas previously you presented what Legal said as (also) the development team's own recommendation. You said, right in the first paragraph:
We've been here before
I'm reminded of the early licence discussions for Wikidata – which, as we all know, ended up importing masses of data from CC BY-SA Wikipedia and republishing them under CC 0, even though this was categorically ruled out in the early stages:
Alexrk2, it is absolutely true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will ''provide'' data that can be reused in the Wikipedias. And a CC0 source can be used by a Share-Alike project, be it either Wikipedia or OSM. But not the other way around. Do we agree on this understanding? --[[User:Denny Vrandečić (WMDE)|Denny Vrandečić (WMDE)]] ([[User talk:Denny Vrandečić (WMDE)|talk]]) 12:39, 4 July 2012 (UTC)
This past U-turn has left a sour taste for me and permanently undermined trust. It's left me with a feeling that the WMF generally wants to push CC 0 as the superior licence and that any statements to the contrary are mere lip service. So right now, my suspicion is that whatever anyone here says – we'll end up with CC 0 for Abstract content and output content sooner or later. This will remove traceability of content to the Wikimedia movement, while re-users will be free to make any derivative content proprietary. I find neither desirable.
Therefore I would like anyone who does not want CC 0 to be applied to Abstract content and output content to be very clear about it here. Then at least it can't be said later on that the entire community agreed with a recommendation that said CC 0 might be the best way to go – which is what both the WMF legal department's and the development team's recommendations are currently saying. --Andreas JN466 12:22, 24 November 2021 (UTC)
- I agree that anyone who does not want CC 0 to be applied to Abstract Content to state so on this page. Legal says that CC 0 is an option, and recommends to use either CC 0 or CC BY-SA. The development team further narrows down the recommendation to be CC BY-SA. --DVrandecic (WMF) (talk) 22:25, 24 November 2021 (UTC)
- It doesn't narrow it down at all. Even now, after this edit, the development team section starts by saying the team recommends to follow the recommendations by Legal and ends with a list of advantages CC0 would have over CC BY-SA. --Andreas JN466 23:39, 24 November 2021 (UTC)
- You say it doesn't narrow it down. The text says, summarized: "Legal recommends two options, CC BY-SA and CC 0. We recommend CC BY-SA.". For me that reads a lot like narrowing it down. -- DVrandecic (WMF) (talk) 21:36, 25 November 2021 (UTC)
- You say, summarised, "We recommend CC BY-SA, but one could argue that CC0 is more future-proof, and it most certainly allows the most freedom." That's about as narrow as a barn door. --Andreas JN466 23:29, 25 November 2021 (UTC)
- Yes, I would agree with your summary, although not with your judgement. -- DVrandecic (WMF) (talk) 00:44, 30 November 2021 (UTC)
- You say, summarised, "We recommend CC BY-SA, but one could argue that CC0 is more future-proof, and it most certainly allows the most freedom." That's about as narrow as a barn door. --Andreas JN466 23:29, 25 November 2021 (UTC)
- You say it doesn't narrow it down. The text says, summarized: "Legal recommends two options, CC BY-SA and CC 0. We recommend CC BY-SA.". For me that reads a lot like narrowing it down. -- DVrandecic (WMF) (talk) 21:36, 25 November 2021 (UTC)
- It doesn't narrow it down at all. Even now, after this edit, the development team section starts by saying the team recommends to follow the recommendations by Legal and ends with a list of advantages CC0 would have over CC BY-SA. --Andreas JN466 23:39, 24 November 2021 (UTC)
Let's stay compatible, let's stay copyleft
I support a copyleft license for all code (GPL, very likely) and copyrightable elements (CC BY-SA), as it is more fair to volunteers and external contributors. Also, it's highly likely that, at least in the beginning, a prime use-case will be people moving content around between Wikipedia and Abstract Wiki. The licenses should not prevent us from doing that.--Strainu (talk) 19:35, 24 November 2021 (UTC)
- Thanks for the comment, Strainu! Just a quick note: your reasoning about Abstract Content and Wikipedia content to both be under the same license is sound. Regarding the code though, a copyleft license such as GPL would *not* allow us to move code from Wikifunctions to the Wikimedia projects. Just because both GPL and CC BY-SA are copyleft licenses does not mean that they are compatible and that code can be moved from one license to another. --DVrandecic (WMF) (talk) 00:28, 25 November 2021 (UTC)
- DVrandecic (WMF), what I meant was that code should be license-compatible with other code we use and content should be compatible with other content.--Strainu (talk) 14:50, 25 November 2021 (UTC)
Do we really have a choice?
(This refers to “Abstract Content” and “Output Content” as defined on the front page. “Function signatures” and “Function implementations” are ignored here. Disclaimer: IANAL.)
The notion of this discussion is that we can pretty much freely choose a license for Abstract Wikipedia, with the options being the license CC-by-sa and the license-waiver CC0. However, this is based on the premise that the works to be created are copyrightable at all—which I doubt to be the case. After reading the WMF position on copyright of AI/ML generated content, I can understand how one could argue that these works may be technically copyrightable under some conditions, but I am not sure whether this really translates 1:1 to this situation.
“Abstract content” is pretty much non-copyrightable factual data taken from CC0-Wikidata, but with an added copyrightable selection and order/arrangement. The latter two aspects seem theoretically copyrightable. However, a threshold of creativity needs to be passed here in order to have these copyrightable aspects in the “Abstract content”. I suspect that this will often not be the case, since both selection and order/arrangement seem to be pretty much pre-determined by data availability from Wikidata, and text render function availability from Wikifunctions; I also expect generation of “Abstract content” to be a highly schematic process (read: no creativity required), so that much of if will eventually be created with automated processes.
The textual form of the “Output content” will be pretty much determined by the text render functions used. For copyrighted text, it legally is usually not a problem to extract the information (pretty much its “abstract content”), and to rephrase it into another textual form containing the very same information in the very same order. When done properly, this does not infringe existing copyright of the original textual representation. I fail to see how this “Output content” should be copyrightable to an extent that users are effectively restricted to comply with CC-by-sa. I also do not understand how the authors of the “Abstract content” should be considered as authors here when the actual textual representation is much rather a result of work done by function authors at Wikifunctions.
In summary, choosing CC-by-sa might give us the illusion that the content here is copyrightable and protected under the terms of that license, when effectively this might not be the case for large and substantial amounts of the content if one was to litigate this in court. This choice woutl also make it difficult to reuse data, since the legal situation from a reuser perspective seems to be unnecessarily complicated—which often scares potential reusers away entirely.
I would also like to mention that this situation is not really new. People are still surprised that facts are being imported from CC-by-sa Wikipedias to CC0 Wikidata. However, the CC-by-sa license does not make this legally impossible at all. Another notable example of a problematic licensing situation is OSM with its OdbL copyleft license which tries to put up a fence around the project’s data, but fails to be an effective protection as desired by many OSM contributors. —MisterSynergy (talk) 01:14, 25 November 2021 (UTC)
- The WMF lawyers seem to have looked at this, and came up with CC BY-SA as an option. The ultimate output of Wikifunctions is text, much like Wikipedia. Why would CC BY-SA make reuse of these texts more onerous than use of Wikipedia texts? --Andreas JN466 01:34, 25 November 2021 (UTC)
- I am not sure how much WMF lawyers have actually looked at the details. It all seems to be based on this paper, but the outcome is not really transferred to this situation. IMO a key problem is that “Abstract content” seems to be only weakly copyrightable at best due to its nature as factual data and the pretty low level of creativity required to generate it.
CC-by-sa for potentially not copyrightable content would confuse reusers. Regular Wikipedia articles are overwhelmingly indeed copyrightable, thus CC-by-sa is a sane choice there. —MisterSynergy (talk) 10:29, 25 November 2021 (UTC)- MisterSynergy As I understand it, abstract content means you compose sentences much as you do in ordinary language use. You have a database of knowledge about the world and its constituents and their possible relationships, and words for them to enable you to talk about them. Abstract content translates these statements into a kind of machine language, so they can then be more easily translated into any human language, using a limited vocabulary, but you are still fundamentally writing text, expressing ideas, etc., just as you do when you write in a human language. --Andreas JN466 10:46, 25 November 2021 (UTC)
- Here we are getting to the actual point. I would like to hear from WMF/Denny whether this question of “Abstract content” copyrightability has been evaluated explicitly, since my impression of the nature of “Abstract content” is a much simpler one. —MisterSynergy (talk) 10:52, 25 November 2021 (UTC)
- Good point. But it's not just that, actually. There is also a language translation process involved before you get to an output sentence like "Jupiter is the largest planet of the solar system". Each and every type of abstract content has to be assigned a human-made or human-reviewed translation into every human language to be served. There will be innumerable challenges based on the intricacies of languages' different grammatical structures – differences in terms of how grammatical gender is or isn't distinguished, case endings, verb endings, ergative languages etc. – all of which will require output checking and fine-tuning by humans. Translation is a creative act, even when assisted by AI; translations are copyrightable. --Andreas JN466 11:04, 25 November 2021 (UTC)
- Yes, but all of this is in scope of Wikifunctions, thus the code license applies here. The process of “Abstract content” generation is really not much else than a selection of facts to be included in the output, from all the facts available in Wikidata, and also its arrangement in the “Output content”. This process might principally include some copyrightable aspects as described above, but I am pretty sure that it is extremely schematic and can probably be automated to a large extent, e.g. by using some sort of generic “Abstract content” templates that fit for all data items of a given type (e.g. humans, cities, companies, etc.).
Generally, the difficult parts of “Abstract Wikipedia” are the data and lexicographical repository Wikidata (excessive to compile, we’re already 9 years into this), and Wikifunctions (complicated because it needs to generate the text). Licensing of both is pretty much undisputed. Any user can come and use these tools to generate their own texts, without using “Abstract content” from Abstract Wikipedia, but using information from Wikidata only. In some sense, the “Abstract content” is simply a demonstration how Wikifunctions can be used to generate text for a given data item and users are free to alter the output as much as they like. We can of course not require them to make any textual output of Wikifunctions functions available under a specific license. —MisterSynergy (talk) 13:17, 25 November 2021 (UTC)- Have a look at Abstract_Wikipedia/Examples/Jupiter. This seems more than "extremely schematic" to me. I don't agree that the resulting text lacks any kind of creativity, or "Schöpfungshöhe", and is just like an alphabetical series of phone book entries that could not really have had any other meaningful form. --Andreas JN466 13:30, 25 November 2021 (UTC)
- The Development Team actually put that very nicely, when they say:
- Editors have a very fine-grained selection of which facts are being displayed and which are not. In Wikidata we strive for completeness over careful selection.
- Editors have a very fine-grained control of the order the facts are being displayed in, constituting narrative elements, which are not available in Wikidata.
- We expect the natural language generation to allow editors to express to some degree of emphasis and selection of wording.
- All of these point towards Abstract Content being more similar to text than to a collection of facts, and therefore we suggest that we follow the same license that we use for the text in Wikipedia, which is CC BY-SA.
- That Jupiter text is a good example of that. --Andreas JN466 23:55, 25 November 2021 (UTC)
- Yes, but all of this is in scope of Wikifunctions, thus the code license applies here. The process of “Abstract content” generation is really not much else than a selection of facts to be included in the output, from all the facts available in Wikidata, and also its arrangement in the “Output content”. This process might principally include some copyrightable aspects as described above, but I am pretty sure that it is extremely schematic and can probably be automated to a large extent, e.g. by using some sort of generic “Abstract content” templates that fit for all data items of a given type (e.g. humans, cities, companies, etc.).
- Hi @MisterSynergy:, I wanted to answer your question whether the legal team has evaluated the question of Abstract Content copyrightability explicitly: yes, they have. We discussed this at length. A bit more text below. -- DVrandecic (WMF) (talk) 00:58, 30 November 2021 (UTC)
- Good point. But it's not just that, actually. There is also a language translation process involved before you get to an output sentence like "Jupiter is the largest planet of the solar system". Each and every type of abstract content has to be assigned a human-made or human-reviewed translation into every human language to be served. There will be innumerable challenges based on the intricacies of languages' different grammatical structures – differences in terms of how grammatical gender is or isn't distinguished, case endings, verb endings, ergative languages etc. – all of which will require output checking and fine-tuning by humans. Translation is a creative act, even when assisted by AI; translations are copyrightable. --Andreas JN466 11:04, 25 November 2021 (UTC)
- Here we are getting to the actual point. I would like to hear from WMF/Denny whether this question of “Abstract content” copyrightability has been evaluated explicitly, since my impression of the nature of “Abstract content” is a much simpler one. —MisterSynergy (talk) 10:52, 25 November 2021 (UTC)
- MisterSynergy As I understand it, abstract content means you compose sentences much as you do in ordinary language use. You have a database of knowledge about the world and its constituents and their possible relationships, and words for them to enable you to talk about them. Abstract content translates these statements into a kind of machine language, so they can then be more easily translated into any human language, using a limited vocabulary, but you are still fundamentally writing text, expressing ideas, etc., just as you do when you write in a human language. --Andreas JN466 10:46, 25 November 2021 (UTC)
- I am not sure how much WMF lawyers have actually looked at the details. It all seems to be based on this paper, but the outcome is not really transferred to this situation. IMO a key problem is that “Abstract content” seems to be only weakly copyrightable at best due to its nature as factual data and the pretty low level of creativity required to generate it.
- That’s a great question, and honestly, a very difficult one. In the end we will only have clarity given more legal precedent, which is currently lacking, or clearer legislation. Until then it will remain a guesswork.
- Will every text generated for Abstract Wikipedia clearly pass the bar to be copyrightable? Probably not. Will some text generated for Abstract Wikipedia pass the constraint to be copyrightable? I very much hope so.
- Will this decision be cited as precedent for other projects generating text? Very likely. What kind of precedent do we want to be? What kind of precedent will be most conducive towards our mission to a world with more free knowledge? One could argue that a precedent of CC 0 could move other publishers to also choose CC 0, as we have seen with Wikidata. Others (I assume Andreas to be one of them) could argue that particularly for algorithmically manipulated text, a license such as CC BY-SA will strengthen transparency and thus would be a better precedent. I am split on that.
- Will we want to use text from the Wikipedias as a source to guide the creation of content for Abstract Wikipedia? Yes, it would be naive to assume that this won’t happen. Would having a different license for that be problematic for that goal? Maybe. It will take a considerable time to be able to catch up with the prose in the Wikipedia articles. One could easily argue that any content in Abstract Wikipedia will be a new creation from scratch, and not merely just a translation of an existing Wikipedia article. -- DVrandecic (WMF) (talk) 01:36, 30 November 2021 (UTC)
- You know, Denny, if the availability of free knowledge is the goal here, what's the problem with people accessing that knowledge on Wikipedia, where it will always BE available? And don't kid people: CC BY-SA will still result in global reuse, just as it does today. There's surely no need to "Flickr-wash" Wikipedia content from CC BY-SA to CC0 (which would "maybe" be problematic, you say?? Wow!) to ensure that. --Andreas JN466 07:53, 1 December 2021 (UTC)
Recommended reading: Heather Ford, The Rise of the Underdog
Heather published this as a chapter in the recent MIT Press Wikipedia@20 book (eds. Joseph Reagle and Jackie Koerner, open access).
As argued in that chapter, there are consequences for both end users and volunteers when "Wikipedia’s facts are now increasingly extracted without credit by artificial intelligence processes that consume its knowledge and present it as objective fact." --Andreas JN466 13:19, 25 November 2021 (UTC)
- I asked at the talk page of Wikimedia Enterprise API if it will be possible to make a video conference with the customers of the Wikimedia Enterprise-API. This can be a chance to understand how they process the information of the Wikimedia projects currently. Telling the source of a statement is an important thing and should be from my point of view also done if it is not necessary because of the license and so there it is from my point of view a responsibilty of the public who consumes content to think about the source. The consumer should pay attention that a source is mentioned and talk to responsible people if it is not. This is from my understanding independent of the license of the content. As far as I know there is used a kind of text mining in some KnowledgeGraphs like the GoogleKnowledgeGraph. For that I am not sure if a CC-BY-SA license can stop them from doing that and so the information could come also to the users. There were discussions as far as I read about a artificial intelligence system called Copilot offered by GitHub to propose code snippets to users and it is based on a big collection of Code of Free Software Projects and Free Texts. I read at Netzpolitik.org an German article with meaning of the author about that and so I am not completely sure if the mentioned legal view is a wider consensus.--Hogü-456 (talk) 19:31, 25 November 2021 (UTC)
- Just wanted to say +1. -- DVrandecic (WMF) (talk) 00:48, 30 November 2021 (UTC)
- As Heather mentions, for a while, around 2013, Google dropped the Wikipedia hyperlink from the Knowledge Graph panel, just presenting the word "Wikipedia" in pale grey as attribution (here and here are some examples of what that looked like). The result was a significant drop in Wikipedia pageviews that caused alarm in the WMF (see fundraising and Knowledge Graph references on that page) and led to the decision to exceed the 2013/14 fundraising target, to prepare for future challenges. Eventually, the hyperlink returned, but this episode and the past troubles with Amazon Alexa illustrate that even with CC BY-SA in place, proper attribution is not guaranteed. Without it, the WMF and the volunteer community would be completely powerless. --Andreas JN466 00:29, 1 December 2021 (UTC)
Question about dates
The content page (overleaf) says, If no rough consensus can be reached, we will organize a formal vote. The mailing list announcement says that you hope to have a consensus by 20 December.
Could you reassure us please, DVrandecic (WMF) and Quiddity (WMF), that this potentially quite far-reaching matter won't be decided on a vote held four days before Christmas? And that if there is no consensus discernible by Christmas, we will have that vote sometime in January 2022, once everybody is back at their desks?
Everybody who's intending to celebrate Christmas this year would appreciate that, I'm sure. Thanks --Andreas JN466 17:19, 25 November 2021 (UTC)
- Yes, of course. Plus I'm re-titling this section to avoid anyone else being confused by it. Quiddity (WMF) (talk) 20:59, 25 November 2021 (UTC)
CC-BY?
As Wikinews use CC-BY, it may be an idea to use a license compatible with it (as abstract contents can be used in any Wikimedia wikis in the future). One preferred version:
- For contents that only combines constructors (such as Abstract_Wikipedia/Examples/Jupiter), we license them as CC-BY.
- For contents containing transitional texts (i.e. texts that is translated though not in abstract content, see d:Template:TranslateThis), those transitional texts are still under CC-BY-SA. In many cases (abstract) articles will contain translatable text that is not yet in abstract form.
--GZWDer (talk) 16:06, 26 November 2021 (UTC)
80.111.67.25's comment about international law
IP 80.111.67.25 wrote this on the project page in the "Facts" section, but it is not part of the officially provided legal analysis so I'm moving it here. Quiddity (WMF) (talk) 18:50, 27 November 2021 (UTC)
- "It should be noted that this long-term project is effectively unprecedented. International Law is comprised of many treaties. Data Privacy Law is a relatively underdeveloped aspect of International Law and nothing other than regional Laws and Guidelines have been issued." — Preceding unsigned comment added by 80.111.67.25 (talk) 15:32, 26 November 2021 (UTC)
- The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.
Pinging previous participants. Thank you for your comments above and elsewhere! Please see the refreshed focus on two specific questions below, and add your signature/reasonings in the two sections. @Amire80, Bluerasberry, Eloquence, GZWDer, Hogü-456, Jayen466, Kasyap, Legoktm, Lucas Werkmeister, Mike Peel, Msz2001, Ralf Roletschek, Raphael.jakse, Sannita, SashiRolls, Smallbones, Strainu, Thadguidry, and Theklan:. Thanks! Quiddity (WMF) (talk) 03:05, 3 December 2021 (UTC)
Restructuring the discussion to move on
- (Note: Some additional discussion has been happening on Telegram / IRC [2] [3], on the mailing list, and some on Facebook. Feel free to take a look there too)
Given that these discussions have been going on for ~10 days, and in order to facilitate consensus finding, I am boldly proposing the following: to summarize the consensus so far and restructure the discussion in order to find a resolution on the open questions.
The full discussion is above, and like every summary this summary will be incomplete.
It seems that the remaining open questions are the following two:
- What license should be the (either default or only) license for code implementations? The two strongest contenders are GPL and Apache. Please let us know below whether you have a strong preference for GPL, for Apache, whether you’re OK with either, or none.
- What license should the Abstract Content for Abstract Wikipedia be licensed under? The two strongest contenders are CC0 and CC BY-SA. Please let us know below whether you have a strong preference for CC0, for CC BY-SA, whether you’re OK with either, or none.
This also means that at this point we assume that the general framework of the discussion is accepted, that is, that we have may assume consensus on the following points:
- Everything is published under a free license.
- We are launching with only a single license for implementations. We *may* add the option to have other compatible open source licenses for software contributions after launch. This is a discussion for a later point.
- Textual content on Wikifunctions (i.e. documentations, project pages, talk pages, etc.) is all published under CC BY-SA. For the sake of consistency, we will use the 3.0 version of the CC BY-SA license at this point.
- Function Signatures and other structured objects (besides Abstract Content and Code Implementations) are published under CC0.
- Output Content has the same license as the input to the functions producing the Output Content. This means in particular that Output Content used for Wikipedia will be published under the same license as the license we choose for Open Question #2 above.
This is all mostly compatible with the previous recommendations (besides the open questions). I understand that there have been individual arguments that don’t fit exactly in how we are funneling this discussion at this point, and I apologize for that, but in order to facilitate the discussion and focus on the biggest points of content, I propose to focus the discussion as suggested.
In order to support transparency, the entirety of the discussion will remain available. We will update the content page accordingly by the end of the week if no one yells. As we said before, we plan to close this discussion on December 20th. The plan is to write up the consensus the week before that, and see if that resonates as consensus. --DVrandecic (WMF) (talk) 18:40, 2 December 2021 (UTC)
- @DVrandecic (WMF): You say in the Overview (overleaf), Textual content on Wikifunctions will be published under CC BY-SA. Above you say, This means in particular that Output Content used for Wikipedia will be published under the same license as the license we choose for Open Question #2 above.. Are these not contradictory? I thought textual content and output content (= text produced) are the same. --Andreas JN466 02:53, 4 December 2021 (UTC)
- @Jayen466: Thanks for asking for clarification. With "textual content" I mean handwritten text on Wikifunctions, e.g documentation of functions, project pages, policies, user pages, talk pages, etc. With "Output Content" I mean the output of a function applied to Abstract Content. I hope that makes it a bit clearer. -- DVrandecic (WMF) (talk) 05:48, 4 December 2021 (UTC)
Open Questions
Function Implementations
Summary of the differences
The main difference is that the GPL is a copyleft license, whereas Apache is not. This means that the GPL requires that people who distribute any GPL-licensed software must also make their source code available under the GPL license. Apache is a more liberal license and allows for wider reuse with fewer restrictions. One advantage of the Apache license is that it clearly covers patent rights, while older versions of the GPL and other open source licenses do not mention patents. If we choose GPL, code from Wikifunctions can only be reused in projects published under the GPL or in internal projects which are not published. If we choose Apache, code from Wikifunctions can be used more widely.
!Votes about the Licence for Function Implementations
Reasonings and/or signatures are welcome.
- For Apache and against GPL
- Support I was dubious at the beginning, but after Denny's last intervention, I am now more leaning towards a freer license such as Apache, instead of GPL. Full disclosure: I will very likely join the AW team on January 2022; while this did not influence my position about the question, I wanted to go public with it for transparency's sake. --Sannita - not just another it.wiki sysop 17:10, 3 December 2021 (UTC)
- Support For software libraries (Function implementations) I think Apache will allow broader sharing and attraction to developers willing to give up their time to foster innovation. --Thadguidry (talk) 20:06, 3 December 2021 (UTC)
- ...
- For GPL and against Apache
- ...
- Either is fine, but I prefer Apache
- Support I share some of the thoughts Denny mentioned below and do not really see a need to use the more restrictive GPL instead of Apache. --Ameisenigel (talk) 20:37, 2 December 2021 (UTC)
- Support However, I could see LGPL being a reasonable middle-ground. As I understand it, it would essentially impose a copyleft obligation on re-users that distribute modified versions of functions, while permitting the use of the library of functions (or a subset thereof) to build non-copyleft applications.--Eloquence (talk) 06:15, 3 December 2021 (UTC)
- Either is fine, but I prefer GPL
- Support I suspect there may be situations where implementations of certain functions could be Apache-licensed while others can (and maybe certainly should) be kept GPLed. As
a shameless plugan example, Ninai makes higher-level decisions about how to construct sentences based on the constructors one uses (do I use "was running" or "ran" in a particular past-tense context?), and calls functions in Udiron to actually create/manipulate syntax trees for those sentences (add the verb "was" between the subject and verb, make the verb a gerund). Perhaps the low-level functions (in this case, of Udiron) could be Apache-licensed, while the high-level functions (in this case, of Ninai) could be kept GPLed? Mahir256 (talk) 19:51, 2 December 2021 (UTC)- I'd argue that the "viral", copyleft nature of the GPL itself would make it quite unreasonable to host both Apache- and GPL-licensed code as part of the same project site. Should Wikilambda support forms of cross-site federation in the future, as are now available in Wikibase, it would be rather straightforward to host a separate project for GPL code. --Hupaleju (talk) 00:57, 3 December 2021 (UTC)
- As I already wrote, I'm inclined to GPL, LGPL, or AGPL, but I'm not really rejecting Apache. What I am really asking for is an explanation: why would Apache provide an ideal level of permissive flexibility? Even more importantly, why is permissive flexibility desirable? Currently, it just states this, as if it's obvious, even though it isn't. If this explanation is good, I may change my mind. --Amir E. Aharoni (talk) 07:28, 3 December 2021 (UTC)
- Hi @Amire80:! I offer a lengthy explanation in the discussion section below. Let me know if you have still questions after that. -- DVrandecic (WMF) (talk) 17:16, 3 December 2021 (UTC)
- Replied there. Amir E. Aharoni (talk) 11:47, 5 December 2021 (UTC)
- Hi @Amire80:! I offer a lengthy explanation in the discussion section below. Let me know if you have still questions after that. -- DVrandecic (WMF) (talk) 17:16, 3 December 2021 (UTC)
- Support. There are good reasons for both, it's more a political statement.--Strainu (talk) 08:28, 3 December 2021 (UTC)
- Neither is fine (please give a counter-proposal)
- CC0 From Wikidata perspective, Wikifunctions is more than a backend for Abstract Wikipedia. It is likely to develop into general inference engine that complements data stored in Wikidata. Wikidata and Wikifunctions will co-develop into mutually interdependent projects that cannot be used separately. Licensing Wikifunctions under Apache License or even GPL therefore means effectively limiting Wikidata to the same license.
To give a concrete example, Wikidata currently stores extensive tables of lexeme forms. If inference rules for predictable forms are added to Wikifunctions, most of the predictable forms are going to be removed from Wikidata in order to ease maintenance and distribution. Once that's done, it will be impossible to fetch form tables from Wikidata without going through Wikifunctions and complying with Wikifunctions license. Wikidata already excludes more complex lexeme forms, which are going to be available only via Wikifunctions.
For this reason I suggest using the same license as Wikidata: CC0. — Robert Važan (talk) 15:03, 3 December 2021 (UTC)- PS: As an alternative to licensing entire function implementations under CC0, I would suggest licensing executable code (without comments, full symbol names, etc.) under CC0. That would allow bundling of Wikifunctions extracts with corresponding Wikidata subsets under CC0. — Robert Važan (talk) 15:19, 3 December 2021 (UTC)
- Hello @Robert Važan: Note that Apache does *not* limit its reuse to the same license (it is not copyleft / viral / share-alike). Furthermore, Apache offers consideration regarding patent and liability which CC 0 does not. -- DVrandecic (WMF) (talk) 17:18, 3 December 2021 (UTC)
- @DVrandecic (WMF): IANAL, but AFAIK you cannot just wholesale relicense Apache code under MIT or CC0. This will get particularly problematic when table-like rule sets get moved forth and back between Wikidata and Wikifunctions. Mentioned lexeme inflection rules lie on such a boundary between data and code. ML models blending Wikidata and Wikifunctions content might be affected too. — Robert Važan (talk) 01:07, 4 December 2021 (UTC)
- That is correct - you can't just relicense it. But you can use Apache code in your own projects, no matter if that project is otherwise licensed under MIT, Apache, GPLv3, or any or no other license at all, and share and republish it together - which is not the case for the GPL. IANAL as well. -- DVrandecic (WMF) (talk) 01:40, 4 December 2021 (UTC)
- @DVrandecic (WMF): IANAL, but AFAIK you cannot just wholesale relicense Apache code under MIT or CC0. This will get particularly problematic when table-like rule sets get moved forth and back between Wikidata and Wikifunctions. Mentioned lexeme inflection rules lie on such a boundary between data and code. ML models blending Wikidata and Wikifunctions content might be affected too. — Robert Važan (talk) 01:07, 4 December 2021 (UTC)
- Hello @Robert Važan: Note that Apache does *not* limit its reuse to the same license (it is not copyleft / viral / share-alike). Furthermore, Apache offers consideration regarding patent and liability which CC 0 does not. -- DVrandecic (WMF) (talk) 17:18, 3 December 2021 (UTC)
- PS: As an alternative to licensing entire function implementations under CC0, I would suggest licensing executable code (without comments, full symbol names, etc.) under CC0. That would allow bundling of Wikifunctions extracts with corresponding Wikidata subsets under CC0. — Robert Važan (talk) 15:19, 3 December 2021 (UTC)
Discussion about the Licence for Function Implementations
One of the people we discussed this question with is Luis Villa. Luis was Deputy General Counsel at the WMF, led the revision of the Mozilla Public License, was part of the team representing Google in the Google-Oracle lawsuit, served on the board of directors of the Open Source Initiative, as an invited expert at the Patents and Standards Interest Group at the World Wide Web Consortium, and on the Legal Working Group of OpenStreetMap. He is now co-founder at Tidelift.
Luis raised many good points, but one in particular struck with me, because it made me pause and led me to think about licenses for open projects like ours differently: “Do we expect any structural features of the project to create share-alike (or anti-sharing) pressures? As a concrete example, the biggest pressure to participate in the Linux kernel is not the license; it is the immense velocity of Linux kernel development. This creates a strong engineering reason to work with the community—participate actively, or else your changes will get left behind. Similar pressure drives many corporate contributions to OpenStreetMap.”
I have seen such pressures in place with Wikidata, and the same pressures are also in place with Wikipedia. In both cases it is not the license that protects the project (in Wikidata’s case it is most obviously not the license), but it is the activity of the community, it is the fact that if you want to stay compatible with Wikidata, if you want to stay up to date with your knowledge, you can’t just take Wikipedia and start maintaining it yourself. A number of projects tried that, none of them with any noticeable impact.
I believe that the same will be true for Wikifunctions. I hope for Wikifunctions to become an active, lively project, that will constantly grow and develop and discover and extend to new horizons and new use cases. We will cover more and more languages, improve the constructors for abstract content, and discover how to reuse knowledge between languages in order to quickly gain coverage on new languages which are underrepresented on the Web.
Is there a benefit in our code to be reused widely? Yes! The more people use it, the more mistakes and improvements will be found and will trickle back. I want users on StackOverflow and similar Websites to answer questions with links to Wikifunctions, and to create functions on Wikifunctions so they can answer those questions. I want them to copy the code without having to think about the license to their answers, to their forums, to their own projects, and so much more.
Would the GPL stop others from copying the code? Would it stop them from, as was discussed above, from taking the natural language library, copy it wholesale to their project, and then make improvements that they are not giving back? No, not at all. Even if it was GPL, they still could just copy the functions, and use that to create the text they need in the languages they need, and just publish the text. The value for another organization will be in the output of these functions - not in the functions themselves. The fact that they can create text in many different languages. Even if the code was released under GPL, there is nothing that would compel such players to give back to the project. Or, as James Forrester put it much more succinctly: more restrictive licenses such as the GPL will not stop the actors we want to stop, but they may very well stop the actors we want to share with.
What will compel such actors to give back to the projects? If we actively grow the project, if we keep acting languages, if we extend the expressivity of our language. Then forking our code just doesn’t make sense, just as forking Wikidata or forking OpenStreetMap or forking Wikipedia does not make sense. It is not the license, as Luis points out correctly, it is other pressures.
A less liberal license will only make it more likely that we will be used more by big players with legal departments who can deal with the complexities of a five thousand word license. There are projects where the GPL may be a great choice - complex, monolithic software projects like GIMP, MySQL, or Blender which are in a space where the risk of being commercialized and closed in a product is a real risk. But for a library of functions which are not even an application?
According to Wikipedia, the GPL is incompatible with being distributed through the Apple and Mac app stores. Shall we really use a license that is incompatible with being distributed on such widely popular devices?
Don’t misconstrue me as an enemy of the GPL. It has its uses. I don’t think Wikifunctions is one. I don’t expect to change the mind of anyone who came here to advocate for copyleft licenses, but I hope to make it clear why I think a project such as Wikifunctions would benefit itself, and would be most beneficial to others, by adopting a less restrictive license such as Apache.
My recommendation is to vote for the Apache license. These are my thoughts on the question, and not the opinion of the WMF. --Denny (WMF) (talk)
- I want to note 2 things:
- With GPL, you don't have to compel anyone to do anything. If you like their derivative work, you're free to just re-incorporate it in the base software. I know there are ways around that and they're easier to implement for small chunks of code than for monolithic applications, but that's not a reason to just give up on that ability.
- You make a generalization around app stores. There is a single app store noted there. It has a significant market share, but it can't be described as "the most widely used devices". Android appstores, including the Google Play and the Huawei equivalent, do list GPL software. There are even dedicated OSS repos, such as en:F-Droid.--Strainu (talk) 10:42, 3 December 2021 (UTC)
- @Strainu: Thanks! The Wikipedia article was misleading me. I honestly was not aware that the Google Play store is compatible with the GPL. I fixed my statement accordingly. -- DVrandecic (WMF) (talk) 06:01, 4 December 2021 (UTC)
- I am sceptical of the idea that Copyleft licences favour Big Tech more than CC0. Google had CC BY-SA Freebase and chose to transfer its content to CC0 Wikidata. A player the size and wealth of Google can easily take CC0 material and create a proprietary black box derivative. (Also, I wouldn't necessarily rely on a Google lawyer for conscientious advice on what best serves our interests.) --Andreas JN466 18:34, 4 December 2021 (UTC)
- @Jayen466 with all due respect, your "Google lawyer" jab is pretty much a low blow, and I would really like you to strike it off from your intervention. Thanks. -- Sannita - not just another it.wiki sysop 15:46, 5 December 2021 (UTC)
- I think it is only prudent, when taking advice from professionals with a history of defending the commercial interests of for-profit companies that have a very substantial commercial interest in our work, to bear the possibility of conflicts of interest in mind. (There is nothing improper in professionals working for a company like Google. We all have to earn an income.) --Andreas JN466 16:08, 5 December 2021 (UTC)
- @Jayen466 with all due respect, your "Google lawyer" jab is pretty much a low blow, and I would really like you to strike it off from your intervention. Thanks. -- Sannita - not just another it.wiki sysop 15:46, 5 December 2021 (UTC)
- I am sceptical of the idea that Copyleft licences favour Big Tech more than CC0. Google had CC BY-SA Freebase and chose to transfer its content to CC0 Wikidata. A player the size and wealth of Google can easily take CC0 material and create a proprietary black box derivative. (Also, I wouldn't necessarily rely on a Google lawyer for conscientious advice on what best serves our interests.) --Andreas JN466 18:34, 4 December 2021 (UTC)
- Sorry, but this explanation is theoretical, generic, and not very convincing.
- Here's a realistic, easy-to-imagine scenario:
- Wikimedia editors and developers (paid or volunteer) make functions that generate help documentation for MediaWiki features. They work well in English, French, Spanish, German, Polish, and Russian.
- A translation agency somewhere in the E.U. notices this, sets up its own a MediaWiki instance with the WikiLambda extension and copies the functions from Wikifunctions. They write abstract content that generates multilingual user manuals for consumer electronics products. So far, so good: everyone is supposed to be happy about this.
- That agency hires developers and linguists who extend the renderers so that they are able to work in Hungarian, Finnish, Turkish, Albanian, and Maltese.
- It is subsequently bought by a software company, which extends the renderers to Arabic and Japanese, and start selling the package as "the global technical documentation translator".
- A religious publisher buys a license for this software, develops renderers for twenty languages of Western Africa, and uses it to prepare its tracts in these languages.
- So we have software, which is modified and distributed: scenarios where viral licenses are supposed to be useful.
- Can Wikimedia force them to share the code that supports Hungarian, Finnish, Turkish, Albanian, Maltese, Arabic, Japanese, and languages of Western Africa? With a copyleft license, it can. With a permissive license, it cannot. After a couple of years, Wikimedians will perhaps develop renderers for Finnish, Hungarian, and Japanese. Other languages will probably have to wait for many more years until capable volunteers will make something for them, or to apply for grants from Wikimedia or other nonprofits.
- Now, you could say that if these companies couldn't sell it without having to give the code back to Wikimedia, maybe they wouldn't develop support for these other languages in the first place. Well, maybe, but there's also another scenario: They do make it, they do share the code with Wikimedia and the rest of the world, but they build their business model around developing abstract content that would be useful for commercial reusers.
- You could also say that they can avoid the virality by keeping the software on their own computers and offering text generation as a service. That's why I mentioned AGPL as a possibility: this license is perhaps better for online services. --Amir E. Aharoni (talk) 11:47, 5 December 2021 (UTC)
- @Amire80: I think your scenario is easy to imagine, but to be honest I don’t find it particularly convincing. Why would the software company in steps 2-4 offer to sell the software as off-the-shelf software? This would mean that its clients would need to train enough of its employees all about the proprietary set of constructors of that company and how to express their content in these constructors. The clients, such as the religious company in step 5, if they want to buy and use that software, would need to train a large enough team to avoid the bus factor within their company just to learn how to use that proprietary system. That’s an investment that I don’t see as particularly likely (how many companies employ in-house translators in the first place? For this team, each member of the team would likely be more expensive than a translator)
- In my opinion, a much more likely scenario is that a company would offer the service of full-fledged translation of natural language text. The offer would be “give us your text, and we will give you the text in the languages you pay for - but unlike our competition, we don’t increase our cost linearly with the number of languages, but every language beyond the first one is much cheaper”. And that company then uses their internal fork of Wikifunctions to create the content they sell.
- But in the scenario I describe, whether Wikifunctions uses Apache, GPL, AGPL, BSD, or any other license simply doesn’t matter. Since the running of the software is entirely internal to the service company, no free license could force them to give back their changes to the source code.
- I don’t see a scenario where the GPL would lead to a benefit, but I see many scenarios where the GPL would be discouraging reuse. E.g. the fact that you can’t use it in any app that one wants to disseminate, even for free, through the Apple AppStore. What is more likely? The scenario that you describe, or the scenario that a hobbyist would like to use a function within an App they want to share on the iPhone? -- DVrandecic (WMF) (talk) 22:48, 6 December 2021 (UTC)
- @DVrandecic (WMF): You're still missing the point I raised above. Follow the Commons approach, which has already been demonstrated to work well. Support multiple licenses as early as possible, don't restrict things to a single license where there are compatible ones out there. It just needs the license to be defined by a template like they are on Commons, which shouldn't be too difficult to implement. Thanks. Mike Peel (talk) 21:55, 5 December 2021 (UTC)
- Not sure whether this is really a good idea here. At Commons, there is usually one main creator/uploader of a file who can choose from a range of licenses. Most follow-up edits are either purely maintenance such as categorization and stuff, or minor modifications of the main work. Functions, however, are probably much more a collaborative effort where I cannot imagine how multiple licenses could work. Should the initial creator select a license and all others have to live with it? I don't thinks this works, just as it would not work for Wikipedia. —MisterSynergy (talk) 23:23, 5 December 2021 (UTC)
- @Mike Peel: The question is whether this is needed before launch. As I suggested above, let us start with a single license, and then, after launch, assess how and whether a multi-license model à la Commons is beneficial. This is a complex question (how would the different implementations play with each other? How do we display licensing information? How do we support, as MisterSynergy points out, the fact that different people might be working on a piece of code together? etc), and will require potentially a significant amount of implementation work. It would be great to postpone this discussion until we have a better understanding of how the system develops, and what the benefits of having multiple licenses would be. -- DVrandecic (WMF) (talk) 23:12, 6 December 2021 (UTC)
(Add your discussion points / argument here)
Abstract Content for Abstract Wikipedia
Summary of the pros and cons
The discussion has been mostly around whether Abstract Content is copyrightable at all, which is a prerequisite for CC BY-SA.
The main difference is that CC BY-SA is a copyleft license whereas CC0 is not. This means text incorporating CC BY-SA content must be attributed and must be shared under the same license. CC0 has no such requirements and allows for easier reuse.
Wikipedia content is under CC BY-SA (version 3.0, unported).
!Votes about the Licence for Abstract Content
Reasonings and/or signatures are welcome.
- For CC BY-SA and against CC 0
- Support This would be easier for reusing Wikipedia content on Abstract Wikipedia. --Ameisenigel (talk) 20:44, 2 December 2021 (UTC)
- Support as previously stated. I agree that CC-0 is appropriate for Abstract Descriptions on Wikidata.--Eloquence (talk) 06:11, 3 December 2021 (UTC)
- Support as described above, it's essential to have free flow between Abstract and other Wikipedias.--Strainu (talk) 08:30, 3 December 2021 (UTC)
- Support I am still specifically convinced (as I was in the first round) that, while the fact that Jupiter is a planet is not copyrightable, a construct that compares Jupiter to all other planets in the Solar System is sufficiently elaborate to make me think it would be more defendable with a CC BY-SA license, than a CC0. In this, I totally agree with the recommendations from the dev team. Full disclosure: I will very likely join the AW team on January 2022; while this did not influence my position about the question, I wanted to go public with it for transparency's sake. --Sannita - not just another it.wiki sysop 17:13, 3 December 2021 (UTC)
- Support as default, possibly w/ a way to explicitly flag subsets of abstract content that seem CC0 (by necessity). I.e. if we decide that subsets are clearly not copyrightable, there should be a way to note that on the projects and not ask reusers to guess. –SJ talk 18:37, 3 December 2021 (UTC)
- Support per the reasoning expressed by Heather Ford here (end users can tell info comes from Wikimedia, plus problem-free re-use of Wikipedia content per Denny and CC licence rather than proprietary status for derivatives) --Andreas JN466 02:25, 4 December 2021 (UTC)
- Support The most notable thing we license as CC0 is Wikidata, which is like a database: large and complex in aggregate, but consisting of tiny statements that are hard to license even with a permissive license, let alone a viral one. Abstract Content is not similar to that. In form, abstract content is more like code: Being able to write in a human language is not enough to be able to write it; no one will ever be able to write abstract content without at least some understanding of programming. In function, and in name, abstract content is content. There are some small exceptions, but we usually don't license code or content as public domain, and that's the right thing to do.
As for the output, I am not a lawyer, but I doubt that it's as easy to enforce the "BY" and "SA" parts of CC-BY-SA with the pure output of the functions as it is with concrete content written by humans. But since this output is expected to be often mixed with concrete content, they should have the same license. --Amir E. Aharoni (talk) 11:00, 5 December 2021 (UTC) - Support I believe that any content created by Abstract Wikipedia should be required to be sourced to Abstract Wikipedia in its future uses. As I alluded to above some concepts exist in some languages while being absent from others. Concepts already coded into wikidata do not yet exist in other languages and may never do. A convenient example is agender gender. Standardized algorithmic decisions regarding how this will be rendered for people (and possibly for fictional characters) would involve, it seems to me, a certain amount of sweat (and possibly raising) of the brows (would it be different for people and for fictional characters? Would the author's own pronoun use be authoritative, overriding standard Abstract Wikipedia usage for fictional characters? for people? etc.) This is only the tip of the iceberg, of course. The same verb in different languages may obviously have different verbal valences, logical entailments ... Responsibility and humility suggest that it's best to require signing one's work. (killing the author also kills authority)... SashiRolls (talk) 19:44, 6 December 2021 (UTC)
- For CC 0 and against CC BY-SA
- I am still not convinced that there is sufficient copyrightable aspect in Abstract Content to consider CC by-sa a viable option for it. Thus CC0. Unfortunately, the arguments brought forward for CC by-sa are mainly of non-legal nature, such as "we might not want to compete with (non-abstract) Wikipedias by choosing a less restrictive license", or "we want re-users to provide data provenance information by choosing a copyleft license, in order to enhance verifiability". MisterSynergy (talk) 20:22, 2 December 2021 (UTC)
- Support If the output should have the same license as the Abstract Content (which I really don't think is necessary), then I think it follows that the Abstract Content should be CC 0 since I don't think the output can get copyright. My explanation is in the discussion below. ♥Ainali talkcontributions 10:58, 4 December 2021 (UTC)
- Support I maybe missing somethin but CC BY-SA implies an author, if I understand the author is the wikifunctions, it's feel very weird to have a non-person for author (and also a bit weird to have a different licence for the function and the output of the said function). A legal advice would be nice here. Cheers, VIGNERON * discut. 21:08, 6 December 2021 (UTC)
- Either is fine, but I prefer CC BY-SA
- Support I think that some abstract content constructors by themselves could be CC0, but at a certain level of complexity (particularly when context is specified that would control their presentation) they might warrant CC-BY-SA licensing. For more elaboration, see the example I posted in the discussion below. Mahir256 (talk) 18:43, 3 December 2021 (UTC)
- Either is fine, but I prefer CC 0
- Neither is fine (please give a counter-proposal)
- The licence should be an automatic consequence of the licensing of the original content. This means that an original composition of CC0 content would be CC0. A modification of CC BY-SA content would be CC BY-SA. However, we should aim to move away from monolithic licensing at the article level. The abstract content should be understood (where it is the case) as a composition of modified content within which different subsisting copyright and licensing applies. This means that public domain content can be “translated” into abstract content and (effectively) remain within the public domain with CC0 licensing. Subject to WMF agreement, it would also allow the inclusion of NC content. This would be suppressed in the default BY-SA output content, just as BY-SA content would be suppressed in CC0 output content. In any event, where abstract or output content is licensed as CC BY-SA, we should avoid an absolute requirement for the “translation” to require attribution in addition to the attribution requirements arising from our use of the original content. CC BY-SA requires us (and subsequent reusers) to specify how the original content was modified; this should be sufficient attribution. --GrounderUK (talk) 10:48, 4 December 2021 (UTC)
- @GrounderUK: Yes, the license of the Output Content should be the same as the license of the Abstract Content, agree! The question here is not about the license of the output content in general, but rather how the Abstract Content for Abstract Wikipedia specifically should be licensed. -- DVrandecic (WMF) (talk) 23:07, 6 December 2021 (UTC)
- @DVrandecic (WMF): We agree on output content, but here I am referring only to abstract content for Abstract Wikipedia. Abstract content that is a derivative (“translation”) of public domain content can have CC BY-SA licensing only for the derivative. I oppose this because it applies restrictions on under-served language versions that do not apply to the well-served language version that uses (untranslated) content that is in the public domain. Of course, most Wikipedia articles are not wholly within the public domain, although they are required to indicate which parts are. Clearly, the BY-SA licensing does not (and cannot) apply to those parts. However, if we “translate” such an article into abstract content we would be compelled to license the whole “translation” as a BY-SA derivative. I propose, instead, to divide the original up according to its actually effective licensing and publish only the “translation” of BY-SA parts into abstract content as a BY-SA derivative. The “translation” of public domain parts of the original could then be published either as a BY-SA original or with a CC 0 waiver. I would oppose the BY-SA option (to repeat myself). Similar thinking might be applied to legitimate use of copyright material within the original article, but here I believe that we would not necessarily have the right to create the derivative. If we do, I would support BY-SA licensing for our “translation” of those parts.--GrounderUK (talk) 01:40, 7 December 2021 (UTC)
- As has been clarified previously, this discussion is only about how the Abstract Content in *Abstract Wikipedia itself* should be licensed! If one specifically intends to create a "verbatim", literal translation of existing public domain content into Abstract Content, I agree that this should be CC0 licensed but it should also not be part of Abstract Wikipedia itself. Rather, it should probably be included in Wikidata. --Hupaleju (talk) 10:14, 7 December 2021 (UTC)
Discussion about the Licence for Abstract Content
My main argument for CC BY-SA for Abstract Content for Abstract Wikipedia is to allow for a smooth and unproblematic reuse of Wikipedia content to very directly inspire and guide the creation of Abstract Content without having to worry about a possible copyright infringement.
A second argument is that there is a certain risk when using CC 0: should Abstract Wikipedia be reasonably successful, it might actually create an alternative to Wikipedia proper that, even though still worse than Wikipedia, might become preferable due to the license. This unfair advantage might, in the long run, lead to wider availability of worse content. By keeping the license the same as Wikipedia we would not create an incentive to use worse content. This argument has developed from Luis’ question above and from Legal’s guiding question.
My recommendation is to use CC BY-SA for the Abstract Content for the Abstract Wikipedia. These are my thoughts on the question, and not the recommendation of the WMF. --Denny (WMF) (talk) 18:43, 2 December 2021 (UTC)
- If people are willing to choose worse content over having to comply with the terms of a CC BY-SA license, to me that is an argument against CC BY-SA. If it's a good license to choose, we shouldn't have to force people to use content which uses it. - Nikki (talk) 05:38, 3 December 2021 (UTC)
Some users above have raised the concern that Abstract Content, taken in general, might be uncopyrightable. I find this very hard to believe wrt. Abstract Content for original encyclopedic text (specifically, for Abstract Wikipedia), which is what's being discussed in this section. One should keep in mind that even the so-called "structure, sequence and organization" of a copyrighted work can itself be protected when it involves some sort of original creation, as opposed to being dictated by outside constraints or entirely customary. Moreover a typical, high-quality encyclopedic article will also contain a significant amount of creative and original expression beyond that - and one of the goals of Abstract Content is to allow for this sort of expressive judgment to be faithfully preserved even when working across languages. Thus, I see CC-BY-SA as the only viable license for a project with Abstract Wikipedia's scope.
At the same time, the sensible choice of CC-BY-SA for the language-independent "source" content of the Abstract Wikipedia should not foreclose the possibility of licensing some content in Abstract Text form as CC-0 within some other, very different project(s). I'd argue that this opportunity should be discussed very clearly since it will have major impacts on how these projects are organized for the foreseeable future. An Abstract Wikipedia that's licensed as CC-BY-SA cannot realistically be hosted "within" or "under" Wikidata without creating an undue burden for that project (it can of course use Wikibase and federate *with* Wikidata!); while it might be entirely proper for Wikidata to contain Abstract Text structures of its own, for its own purposes (including e.g. for Abstract Descriptions, Abstract Senses of lexical data, etc.), licensed under CC0 as with the existing structured content of Wikidata itself. --Hupaleju (talk) 23:38, 2 December 2021 (UTC)
- Yes, to make it very explicit (and I am happy with suggestions to make it more explicit, if you find a place where to say it): This is only about the Abstract Content for Abstract Wikipedia. For example, in particular, Abstract Content for Abstract Descriptions should and will be licensed under CC0, following how Descriptions in Wikidata are licensed now. The same for Abstract Glosses. Fully agreed!
- In fact, if someone wants to create Abstract Content and publish it without a free license, that will also be totally fine (but it wouldn't be hosted on Wikimedia projects, as we only host open licensed content). -- DVrandecic (WMF) (talk) 00:01, 3 December 2021 (UTC)
- If our Abstract Content would be licensed CC by-sa, someone could also create their Abstract Content and publish it under CC0. The Output Content might even be very similar to ours, depending how much (or little) protection it actually enjoys. —MisterSynergy (talk) 09:18, 3 December 2021 (UTC)
- Support for the recommendation. Mathematics belongs to all of humanity and access to it should not be restricted by legal terms; many function definitions and simple implementations will be overlapping with mathematical concepts and insights that nobody, not even Wikipedia, has a right to put their conditions on. More practically, more permissive licenses will simplify usage for all the people and small organisations that do not have a large corporate legal department to protect them. We would fail the open source community if we adopt a license that excludes a big share of modern OSS projects (as a developer, I have founded and maintained GPL and Apache Licensed projects, and would like to use AW in either case, which I [to my legal understanding] only works if we choose a more permissive license). --Markus Krötzsch (talk) 07:19, 3 December 2021 (UTC)
- @Markus Krötzsch: Thank you a lot for your answer! Would you mind if I put / would you mind putting your vote on the specific options above? -- DVrandecic (WMF) (talk) 17:26, 3 December 2021 (UTC)
- I prefer licensing CC 0. This makes it easier to reuse something without doing it against the license. Usually I learn something by looking at things that exist about that topic and then take these examples and prepare them for my use case. If I need to collect the URLs of the pages where I have it from and forget that in a case then I hurt the license. After my understanding of CC-BY-SA the license applys also for derivates of a work. So I am not sure how far a user can publish propably copyrightable content for what a implementation in Abstract Wikipedia exists under another license if the Lexemes used in it for example are changed. So the question is how long is a work a derivate of something else and prevents CC-BY-SA external users from using Abstract Text in their projects that are lincses under other licenses. This is a question for what I wish a legal investigation after as I understand there are different views about what follows out of the decision for other external users of the abstract text generation system.--Hogü-456 (talk) 18:11, 3 December 2021 (UTC)
To provide an example of my reasoning for my abstract content license vote and I should note, despite being de-0, I'm only using a German example here because I expect no one else to understand/care for a Bengali one, consider the constructor combination
Action( [[d:Lexeme:L617591#S2|L(617591)["S2"]]], Agent(Speaker()), Hesternal(), Topic(Concept("[[d:Q187880|Q187880]]")), Source(Instance(Concept("[[d:Q326653|Q326653]]"), Definite())))
This by itself, yielding the base rendering into German ("Ich nahm gestern dem Buchhalter Excalibur weg.", using the default parameters of Ninai/Udiron) is likely worth keeping CC0, and permutations of the arguments after the first argument to "Action()"—Action(L(...), Topic(...), Hesternal(...), Source(...), Agent(...))
would be an example of such a permutation—are also likely worth keeping CC0 as well. When contextual modifications to components are specified in a wrapper around that combination, such as
With( # begin contextual modifications Apply("Action", Passive()), Apply("Action/Agent", Nosism()), Apply("Action/Hesternal", InitialEmphasis()), Apply("Action/Source/Instance/Definite", Emphasis()), # end contextual modifications Action( L(617591)["S2"], Hesternal(), Source(Instance(Concept("Q326653"), Definite())), Agent(Speaker()), Topic(Concept("Q187880"))))
which might yield the result "Gestern wurde Excalibur diesem Buchhalter von uns weggenommen." (possibly grammatically incorrect—I'm trying as best as I can), then I could see the resulting abstract content being CC-BY-SAable (with attribution to the person who decided to apply those contextual modifications). @MisterSynergy: for thoughts on this. Mahir256 (talk) 18:43, 3 December 2021 (UTC)
- Question is: what exactly is copyrightable in your example? This has unfortunately not really been addressed yet (by WMF/Denny).
Abstract Content is not automatically copyrightable just because it looks complex or lengthy, or is expensive to compile. As I said earlier (context: #Do we really have a choice?), we can have copyrightable aspects selection and arrangement in Abstract Content: you select facts/information (from Wikidata) to be included in the Output Content and arrange them in a way that the Output Content makes some sense; you also select functions (from Wikifunctions) to be used for the text generation, which is the way how one can vary expression. The Abstract Content itself, however, in spite of having some sort of complex form, is a technical notation that is basically a mapping of factual data as input parameters to functions—no creativity is required to compile the textual form of Abstract Content. In my opinion it has some conceptual similarity to, for instance, a JSON representation of a Wikidata item that is also nothing but a technical representation of factual data; JSON (and other) representations of factual data are not published under a copyleft license as well, for good reasons.
While I do acknowledge that we could publish selection and arrangement of the Abstract Content under CC by-sa, I am convinced that this is not what anyone would expect—neither Abstract Content authors, nor reusers of Abstract Content/Output Content. Any license that requires some sort of action by the reuser would thus very likely lead to frustration on both ends.
Besides all of this, we need to consider that the difficult parts of Abstract Wikipedia are Wikidata (needs to be rich of data) and Wikifunctions (does the complex job of text rendering). Both are available to external users, thus anyone could generate their own Abstract Content and use or release it under any conditions. It is effectively not under our control which Output Content is being generated with this tool and how it is being used, and we should thus not deliberately try to control it if this does not work anyways. —MisterSynergy (talk) 19:33, 3 December 2021 (UTC)
- I would argue that even with that, the control of the form of the output is little to none. The form may change depending on languages, later edits etc. This means that what the person creating the abstract content has done is to express an idea. He or she has not expressed the form of the output yet, and it is neither totally under their control, nor will the person have knowledge of the possible outputs of all the languages it can be rendered in. And since copyright can only be given for a certain expression and not for an idea, the output, in my opinion, can not be copyrightable at all. The best we can do is to be clear about that and acknowledge the output as CC 0. ♥Ainali talkcontributions 19:32, 3 December 2021 (UTC)
- @Ainali: Compare it to literature. If an author writes a novel in English, then a Japanese or Russian translation may not contain a single letter written or controlled by the original author, yet there is no doubt that they have intellectual property rights – the novel is still their work, even if they share the copyright of the translation with a translator or publisher. If what you say were true, anyone could translate any book that's still under copyright into another language without seeking permission from the author. This is most definitely not the case. --Andreas JN466 02:41, 4 December 2021 (UTC)
- Not really, since there is no original form to compare with. As the abstract content is only expressing the idea of something rather than controlling the form, it is not comparable to regular translations. This is obvious since there may be near infinite ways to construct the abstract content (reordering the individual functions, writing functions in different programming languages, etc.) that will produce the same output. In literature, there is an expression of the idea that is being translated. In Abstract Wikipedia this would be important because if a word has synonyms, and the translations has synonyms in turn, it would generate copyright for all permutations which is the opposite of getting copyright for the expression of the idea. Let's consider this pseudocode as an example (which in English perhaps could render as 'the sky is white as fog'):
comparison(determined,subject(Q527),characteristic(Q23444),object(Q37477))
- All these Swedish outputs would be correct:
- Himlen är vit som dimma.
- Himlen är vit som mist.
- Skyn är vit som dimma.
- Skyn är vit som mist.
- These are four different forms from the same idea, and this is just a tip of the iceberg in a very simple example. And of course in my code, reordering the subject, characteristic and object would (in a good implementation) result in the exact output which shows that it is the idea that is expressed, and that the form of the Abstract Content is largely irrelevant and has little to no bearing on the output. ♥Ainali talkcontributions 08:19, 4 December 2021 (UTC)
- What could be copyrightable though is the form of the Abstract Content. Which means that what actually is a translation of the form in Swedish is this:
jämförelse(bestämd,subjekt(Q527),egenskap(Q23444),objekt(Q37477))
- This is hugely different from generating the output. I hope that this shows that rendering the idea is not comparable to translating an expressed form. ♥Ainali talkcontributions 08:44, 4 December 2021 (UTC)
- @Ainali: This is getting too foggy for me ... I'm not sure I follow. Could you please apply your reasoning to a concrete example, Abstract_Wikipedia/Examples/Jupiter, the Simple English Wikipedia article? To me, it sounds like you're saying all the translations we might generate of this (or any other) CC BY-SA text via Wikifunctions/Abstract Wikipedia would be CC0. Is that it? --Andreas JN466 10:14, 4 December 2021 (UTC)
- Not really. What I am saying is that the generated output isn't translations as much as following instructions to construct a way to express an idea. This would be even more clear if the sentence were built up by calculating something, like "San Francisco is the 4th largest city in California". With a slightly more advanced function than the one in the example, the rank could be calculated on the fly through a query to Wikidata and the number itself wouldn't be in the Abstract Content at all, and might also change over time. How could the author of the Abstract Content get copyright for the output, the form the idea expresses, when the form changes over time without his or her knowledge or influence. ♥Ainali talkcontributions 10:27, 4 December 2021 (UTC)
- There is dynamic content in Wikipedia as well (age etc.). Can we agree that if we start with simple:Jupiter and end up with a version of that text in Swahili or Marathi for use in Amazon Alexa, Apple Siri or Google Assistant, then we've produced a translation and derivative work that's still CC BY-SA? --Andreas JN466 11:13, 4 December 2021 (UTC)
- No, I don't agree that this is translation. That example is deceptive in the sense that it looks like they are translations because the functions are too simple. Consider the San Francisco example instead to be generic for any city and administrative unit. The description of that could be: generates a sentence which ranks something compared to similar items in some location by some property. It would then behave something like (over simplified pseudocode):
rank(this,property(P1082),location(administrative unit(this)))
- Depending on where it is used it could generate:
- Stockholm ist die größte Stadt im Stockholms län.
- San Francisco is the fourth largest city in California.
- Gimo är den andra största staden i Östhammars kommun.
- To me, it looks quite clear that these are not translations at all. ♥Ainali talkcontributions 11:35, 4 December 2021 (UTC)
- Using Abstract Wikipedia to create CC0-licensed language versions of CC BY-SA Wikipedia articles is licence laundering. --Andreas JN466 18:10, 4 December 2021 (UTC)
- I don’t think that an exact reproduction of regular Wikipedia content will be the norm. The Jupiter example is meant to demonstrate that the system will likely be able to generate text on a Simple Wikipedia level, and how Abstract Content will roughly look like. The reproduction of existing content is not a core element of the example. That said, Abstract Wikipedia articles will of course have some conceptual similarity to regular Wikipedia articles since both aim to be encyclopedic articles about a given topic. This form factor already defines much of the expected structure of the article, as well as the content to be incorporated. —MisterSynergy (talk) 20:24, 4 December 2021 (UTC)
- I agree. And perhaps we even need to be restrictive if someone tries to capture an article word for word of lengths that reaches the threshold for originality. Most of all, it would be an incredible inefficient way to use Abstract Wikipedia to try to write articles one at a time. The strength of it comes when you write something that covers all planets in the universe, all mammals or all association soccer players at a time, not one. ♥Ainali talkcontributions 20:37, 4 December 2021 (UTC)
- The fact that it should be possible (eventually) to recreate an existing Wikipedia article in Abstract Wikipedia indicates to me that the level of creativity of an Abstract Wikipedia article is basically no different than that of a conventional Wikipedia article. Which, I am happy to say, is also the development team's view. The question to me is how quickly machine translation will progress. Advances in that field may well outpace Abstract Wikipedia development. For example, if I stick the Simple English Wikipedia article on Jupiter in DeepL, I already get a perfect translation today. (Afterthought: One could actually re-purpose Simple English Wikipedia as a "Designed for machine translation" Wikipedia.) --Andreas JN466 22:06, 4 December 2021 (UTC)
- Well, most of the content will probably not have a threshold of originality or be copyrightable by other reasons that I have already shown. And putting CC BY-SA on that would be copyfraud since there is no one holding the right to license it as such. So the fact that some content possibly could reach the technical level for copyright (although I am still not convinced that recreating an article that momentarily happen to generate the output of something existing in one language can hold copyright if the moving parts of it changes) only means that we either have to have mixed licenses for the output or that what could be licensed as CC BY-SA will need to be CC 0 and that we have to fight content that cannot be licensed CC 0 the same way we fight copyright infringements on other projects. Machine translation does not have anything to do with Abstract Wikipedia or any of its parts, so I have no clue why you are bringing it up in this discussion. ♥Ainali talkcontributions 22:22, 4 December 2021 (UTC)
- We have established we disagree profoundly on the licensing ... let's leave it at that. As for machine translation, my understanding is that the whole purpose of Abstract Wikipedia is to write articles – using Wikidata items and simple grammatical structures – that will be easy to translate into languages whose Wikipedias are currently poorly developed. So if they lack an article on a topic, the relevant language translation of the Abstract Wikipedia article can be substituted. The point is to give readers articles in their language, and to give re-users (like voice assistants and search engines) basic material to use for customers speaking those languages. In this way, Alexa e.g. can learn to speak Kannada, or Marathi, or Swahili, etc. In other words, Abstract Wikipedia articles are "easy-to-translate articles written in an artificial language". Now in Simple English Wikipedia we already have "easy-to-translate articles" that today's free machine translation programs can do a pretty good job with. And as soon as machine translation of Simple English articles can serve a language like Marathi, or Swahili, there is a ready-made alternative source of content fulfilling the same purpose as Abstract Wikipedia. --Andreas JN466 22:37, 4 December 2021 (UTC)
- I think you have either misunderstood how Abstract Wikipedia will work, or you have a very different definition than me of what translation means. This will be generating text, not from text at all, but from functions. It will not be possible to drag and drop text and get output. You will have to carefully design the idea that you want to express, something that no machine translation are capable of doing today. So no existing text will be able to serve as a ready-made alternative. At most, it will be possible to look at, and be inspired by, simple Wikipedia because it has already cut away a lot of the text that are more about the form than the idea. But there will still be the need for a human in the loop to create the Abstract Content. ♥Ainali talkcontributions 23:04, 4 December 2021 (UTC)
- Yes, using Abstract Wikipedia to translate a Simple English article requires a human to write a code version of the article first (and to create the language tools to convert that code into a human language). Getting a machine translation from a Simple English Wikipedia article does not. I understand that the project isn't about recreating Simple English Wikipedia in Wikifunctions, but doing just that is one of the official examples given. I was merely speculating on when machine translation in the languages of interest will become good enough that you can simply write in English, complete with some dynamic content linked to Wikidata if you like, and get translations fit for purpose. (We don't have to continue that tangent here; feel free to contact me on my user talk page.) --Andreas JN466 12:24, 5 December 2021 (UTC)
- I think you have either misunderstood how Abstract Wikipedia will work, or you have a very different definition than me of what translation means. This will be generating text, not from text at all, but from functions. It will not be possible to drag and drop text and get output. You will have to carefully design the idea that you want to express, something that no machine translation are capable of doing today. So no existing text will be able to serve as a ready-made alternative. At most, it will be possible to look at, and be inspired by, simple Wikipedia because it has already cut away a lot of the text that are more about the form than the idea. But there will still be the need for a human in the loop to create the Abstract Content. ♥Ainali talkcontributions 23:04, 4 December 2021 (UTC)
- We have established we disagree profoundly on the licensing ... let's leave it at that. As for machine translation, my understanding is that the whole purpose of Abstract Wikipedia is to write articles – using Wikidata items and simple grammatical structures – that will be easy to translate into languages whose Wikipedias are currently poorly developed. So if they lack an article on a topic, the relevant language translation of the Abstract Wikipedia article can be substituted. The point is to give readers articles in their language, and to give re-users (like voice assistants and search engines) basic material to use for customers speaking those languages. In this way, Alexa e.g. can learn to speak Kannada, or Marathi, or Swahili, etc. In other words, Abstract Wikipedia articles are "easy-to-translate articles written in an artificial language". Now in Simple English Wikipedia we already have "easy-to-translate articles" that today's free machine translation programs can do a pretty good job with. And as soon as machine translation of Simple English articles can serve a language like Marathi, or Swahili, there is a ready-made alternative source of content fulfilling the same purpose as Abstract Wikipedia. --Andreas JN466 22:37, 4 December 2021 (UTC)
- Well, most of the content will probably not have a threshold of originality or be copyrightable by other reasons that I have already shown. And putting CC BY-SA on that would be copyfraud since there is no one holding the right to license it as such. So the fact that some content possibly could reach the technical level for copyright (although I am still not convinced that recreating an article that momentarily happen to generate the output of something existing in one language can hold copyright if the moving parts of it changes) only means that we either have to have mixed licenses for the output or that what could be licensed as CC BY-SA will need to be CC 0 and that we have to fight content that cannot be licensed CC 0 the same way we fight copyright infringements on other projects. Machine translation does not have anything to do with Abstract Wikipedia or any of its parts, so I have no clue why you are bringing it up in this discussion. ♥Ainali talkcontributions 22:22, 4 December 2021 (UTC)
- The fact that it should be possible (eventually) to recreate an existing Wikipedia article in Abstract Wikipedia indicates to me that the level of creativity of an Abstract Wikipedia article is basically no different than that of a conventional Wikipedia article. Which, I am happy to say, is also the development team's view. The question to me is how quickly machine translation will progress. Advances in that field may well outpace Abstract Wikipedia development. For example, if I stick the Simple English Wikipedia article on Jupiter in DeepL, I already get a perfect translation today. (Afterthought: One could actually re-purpose Simple English Wikipedia as a "Designed for machine translation" Wikipedia.) --Andreas JN466 22:06, 4 December 2021 (UTC)
- I agree. And perhaps we even need to be restrictive if someone tries to capture an article word for word of lengths that reaches the threshold for originality. Most of all, it would be an incredible inefficient way to use Abstract Wikipedia to try to write articles one at a time. The strength of it comes when you write something that covers all planets in the universe, all mammals or all association soccer players at a time, not one. ♥Ainali talkcontributions 20:37, 4 December 2021 (UTC)
- I don’t think that an exact reproduction of regular Wikipedia content will be the norm. The Jupiter example is meant to demonstrate that the system will likely be able to generate text on a Simple Wikipedia level, and how Abstract Content will roughly look like. The reproduction of existing content is not a core element of the example. That said, Abstract Wikipedia articles will of course have some conceptual similarity to regular Wikipedia articles since both aim to be encyclopedic articles about a given topic. This form factor already defines much of the expected structure of the article, as well as the content to be incorporated. —MisterSynergy (talk) 20:24, 4 December 2021 (UTC)
- Using Abstract Wikipedia to create CC0-licensed language versions of CC BY-SA Wikipedia articles is licence laundering. --Andreas JN466 18:10, 4 December 2021 (UTC)
- There is dynamic content in Wikipedia as well (age etc.). Can we agree that if we start with simple:Jupiter and end up with a version of that text in Swahili or Marathi for use in Amazon Alexa, Apple Siri or Google Assistant, then we've produced a translation and derivative work that's still CC BY-SA? --Andreas JN466 11:13, 4 December 2021 (UTC)
- Not really. What I am saying is that the generated output isn't translations as much as following instructions to construct a way to express an idea. This would be even more clear if the sentence were built up by calculating something, like "San Francisco is the 4th largest city in California". With a slightly more advanced function than the one in the example, the rank could be calculated on the fly through a query to Wikidata and the number itself wouldn't be in the Abstract Content at all, and might also change over time. How could the author of the Abstract Content get copyright for the output, the form the idea expresses, when the form changes over time without his or her knowledge or influence. ♥Ainali talkcontributions 10:27, 4 December 2021 (UTC)
- @Ainali: This is getting too foggy for me ... I'm not sure I follow. Could you please apply your reasoning to a concrete example, Abstract_Wikipedia/Examples/Jupiter, the Simple English Wikipedia article? To me, it sounds like you're saying all the translations we might generate of this (or any other) CC BY-SA text via Wikifunctions/Abstract Wikipedia would be CC0. Is that it? --Andreas JN466 10:14, 4 December 2021 (UTC)
- @Ainali: Compare it to literature. If an author writes a novel in English, then a Japanese or Russian translation may not contain a single letter written or controlled by the original author, yet there is no doubt that they have intellectual property rights – the novel is still their work, even if they share the copyright of the translation with a translator or publisher. If what you say were true, anyone could translate any book that's still under copyright into another language without seeking permission from the author. This is most definitely not the case. --Andreas JN466 02:41, 4 December 2021 (UTC)
- I'm thinking as I'm writing but it feel like this is a tool and product situation. Wikfuctions being the tool and Abstract being the resulting product. If I make a sentence, a map on any product, the way I use the tool do it the product is creative and copyrigthable but the result in itself is not since everyone using the some recipe would get exactly the same result. For instance, on WikiCommons, if I use a proprietary tool (and almost all cameras are), it doesn't influence the copyright of the result (and this is why one can choose the license). The other way around, if I produce the same drawing with Adobe or Inkscape, again it doesn't influence the copyright of the resulting map. With the same data and the same wikifunctions, the abstract content will always be the same, in that case what would a CC BY licence means, who would be the author? no-one, every-one, the functions? This seems a bit pointless to even search for an author here (like for a table made by a template on Wikipedia, the creator of the template is not credited for the result on every page the template is used). Cheers, VIGNERON * discut. 21:23, 6 December 2021 (UTC)
(Add your discussion points / argument here)