Wikipedia:Wikipedia Signpost/2011-09-05/Opinion essay
The copyright crisis, and why we should care
Moonriddengirl has been a Wikipedian since the first half of 2007, becoming an administrator for the English Wikipedia later that year. In that capacity, she dedicates much of her volunteer time to dealing with copyright concerns at the English Wikipedia's copyright problems board and contributor copyright cleanup, attempting to implement Wikimedia's zero tolerance policy on copyright infringements. In addition, she works for the Wikimedia Foundation in community liaison. Below, Moonriddengirl outlines her view that all contributors need to pull together to manage copyright concerns on the English Wikipedia.
The views expressed are those of the author only. Other editors will often leave opposing views and potential corrections in the comments section. The Signpost welcomes proposals for op-eds. If you have one in mind, please leave a message at the opinion desk.
We have a copyright crisis. Wikipedia is full of copyright problems. How full, I don't know.
I do know that CorenSearchBot (before it became inoperable due to a catastrophic change in Yahoo's terms) routinely found several dozen new articles every day built on content copied from other websites. I know that every day more articles and images are tagged by human contributors for speedy deletion for copyright concerns or listed for the slower processes of the copyright problems board or possibly unfree files. I know that there are more tens of thousands of articles and images awaiting copyright review at WP:CCI than I want to tally; this is content placed by people we know have repeatedly violated copyright. Odds are good that a substantial portion of this content is a problem, too. In spite of policies prohibiting it—and in spite of prominent reminders of those policies on every edit page—more copyrighted content finds its way into our project every day.
Why it happens
People place copyrighted content on Wikipedia because they can, because it's easier to copy somebody else's words than write your own, because it's hard to resist using somebody else's picture when the only other alternative is that an article has no pictures at all. Some people do it accidentally, attempting to change content but not changing it enough. Some people do it defiantly, using Wikipedia as part of their own statements against copyright laws.
Most people do it with good intentions, I believe. I've talked to hundreds of people about this over the last few years. Few of them seem to be out to deliberately cause trouble, even the ones who wind up being blocked because we can't get them to stop. The fact is that many of them just don't see the harm, and some have trouble even understanding what the issue is.
In some cultures, copyright is no big deal—even reputable sources copy without obvious concern. (No kidding: I've seen books by evidently respected academicians that have baldly copied from Wikipedia without credit and government websites that have done the same.) In a way, it's not much of a deal to the international Internet culture we all share. People paste news articles into their blogs or appropriate copyrighted cartoon characters as their avatars all the time, without a thought as to whether the content is copyrighted and what that might mean.
Why we should care
This may be why even some of the contributors who don't cause the problems and who plainly do understand the concept of copyright simply don't think about whether or not it's happening here. Blatant violations may pass right in front of them, and they don't notice. They simply don't seem keyed in to the issue. It happens everywhere, and, after all, if a copyright holder objects, all we have to do is take it down.
While technically true, this is an attitude Wikipedia can't afford. For whatever reasons people place the content, and however we ourselves may feel about copyright, keeping it is not only potentially damaging to copyright holders, it's bad for us. It's bad for our reusers; it's bad for Wikipedia; it's bad for our volunteers.
I'm not going to discuss the question of whether intellectual property laws are a good thing or a bad thing. (Although as a published writer who receives small royalty checks every year, I have a certain interest in the question.) It's a passionately debated subject, and, in my opinion, it's not necessary to go into it to settle the important point. It's a simple matter of fact that we are subject to intellectual property laws, and we need to recognize how working within that reality is in our best interests. While we have the option to swiftly address copyright concerns by simply pulling material from publication—indeed, we have a legal obligation to have a designated agent to answer takedown notices sent to us by copyright holders and their representatives—our content reusers may not have the option of responding so simply. If a video documentarian uses images that were hosted on Wikipedia under the mistaken belief that the free license label on them is accurate, he may have to recut his documentary to remove them or replace them with something else. If a publisher places some of our featured articles on animals in a textbook, she may have to pull it from distribution.
This is a major problem. We like content reusers (if not all of them). We really do. We encourage them to do it—to use our material online, in books, newspapers, video documentaries; to use it and modify it whenever and however they like, so long as they follow the licensing terms. Indeed, the Wikimedia Foundation's mission is "to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally." We've made it as easy for them as we can. But how many times would a reuser encounter the trouble or expense of withdrawing problematic content before deciding to avoid our work? If the content we bill as "free" is not, we risk damage to our reputation and discouraging the global dissemination of our work.
Beyond that, I have personally observed the inconvenience and expense (at least in terms of time) to our volunteers when copyright problems created by others are encountered too late. "Too late" in this context would be after they have themselves engaged with the content. Too often, somebody creates an article or expands it with copyrighted content placed without permission of the copyright holder. Others come behind to improve the article, sometimes putting a great amount of time into polishing prose, locating sources, adding text. Their work is tainted, too. The time they've spent polishing copyrighted content is lost when that content must be removed. The hours they've put in could have been better spent building usable content or creating an article we can retain. Then there is the cost to their motivation. I've spoken multiple times to people in this situation who are heartsick and discouraged by the experience. I hate the thought that we've wasted their time, that we might lose them, because of a problem that was not promptly detected or resolved.
There's also a cost to the volunteers who create the problems in the first place. As I said, I believe most of these people are working in good faith. Those who have trouble grasping the issue may require more guidance than those who simply didn't think it mattered, but copyright problems can be corrected. If the issue is discovered early in a Wikipedian's career, we may be able to more easily clean up any outstanding issues and help them avoid creating more, enabling them to move forward as constructive and valuable contributors. If problems linger, more articles may be tainted and fall-out greater in terms of both collateral damage to others and loss of the contributor themselves.
We need to care; we need to take action.
What we can do
While copyright cleanup can use all the active contributors it can get, you can help with the problem simply by being conscious of the potential so that you recognize copyright issues when they appear. Does an image look unlikely to be original to the uploader? Text too polished or disjointed in tone? Even if you don't feel that you can help with cleanup, you can tag a suspicious text or image copyright concern for others to evaluate. You can save reusers potential time and expense, save your fellow volunteers wasted effort, perhaps a reparable contributor issue from devolving into an unsalvageable one. The simple act of identifying the problem is the first, crucial step to resolving it. Swift handling is the best service we can provide to our reusers, to the project and to our contributors (as well as, in my opinion, to the copyright holders). By recognizing the problem and resolving it when it first appears, we can keep it contained.
Further reading
- Wikipedia:Cv101: How to handle text-based copyright problems
- Wikipedia:Guide to image deletion: The somewhat more complex process of handling image issues
- Wikipedia:WikiProject Copyright Cleanup/How to clean copyright infringements: Suggestions for helping with identified copyright issues
- Previous Signpost coverage:
- "Talking copyright with WikiProject Copyright Cleanup" (WikiProject report, 6 December 2010)
- "Let's get serious about plagiarism" (Dispatches, 13 April 2009)
Discuss this story
Wikipedia is in the position of strength here. The search engines require Wikipedia a lot more then the reverse. We should consider blocking spidering of Wikipedia by Yahoo unless they allow Wikipedia via CorenSearchBot access to search returns for copy vio purposes. It wouldn't take long for Yahoo to be distressed at such a decision. Can we use Google to check the copy vios? Google makes big bucks out of Wikipedia by immediately accessing the updates. Is there an a technical issue of using Google or some policy issue? Regards, SunCreator (talk) 01:58, 6 September 2011 (UTC)[reply]
I just wanted to point out that back on August 30th, I proposed a change to Special:NewPages to help us deal with copyvios while CorenSearchBot was down. The thread can still be found at Wikipedia talk:New pages patrol#Proposal of additional bullet point at top of Special:NewPages (while CorenSearchBot is down). Singularity42 (talk) 20:01, 7 September 2011 (UTC)[reply]
We make this even more complicated with a fair-use policy that is more restrictive than U.S. law, so someone who thinks they're OK (and would be elsewhere) is actually not (I have found it interesting that, in surveys of how many new accounts stick around to become members of the community, virtually none of those whose first edit was to create a page outside of article namespace have done so. Hmm ... what kind of new user starts by creating a non-article page? You got it ... someone uploading an image that they thought they could use (It would be interesting to see how many of them did, indeed, upload third-party copyrighted images that wouldn't be justified under our policies). Daniel Case (talk) 19:59, 9 September 2011 (UTC)[reply]
Yahoo and Google both permit automated queries (which is what Corenbot is/was). They charge for them, though; you can see those costs by following the links to the relevant terms of service mentioned here [1]. The cost wouldn't be minimal for the Foundation (Google, $5 per 1000 queries, for up to 10,000 queries per day; [2]; Yahoo either 80 cents per 1,000 or 40 cents per thousand using a limited index and slower refresh (about 3 days).[3] However many thousand new articles per day over all projects, times number of queries per article (possibly one for each article sentence?) And I'd suppose other non-profits, including university research projects, would like cost exemptions, including those for copyvio searches, and have comparable claims. Novickas (talk) 01:19, 10 September 2011 (UTC)[reply]