Page MenuHomePhabricator

Verify license tags for custom licenses ("Another reason not mentioned above")
Closed, ResolvedPublic

Description

(Split off from T123127)

Files uploaded license typed in "Another reason not mentioned above" field, need to be checked. The simplest way is to check if there is a template there (do we have {{...}} brackets), a better way is to check if the typed text transcludes {{License template tag}}, and not allow anything that does not. here is example of taday's upload without license.

Event Timeline

matmarex renamed this task from Verify licenes tags for custom licenses ("Another reason not mentioned above") to Verify license tags for custom licenses ("Another reason not mentioned above").Jul 20 2016, 3:12 PM

We used to have some form of this, but IIRC it was broken in some way and eventually removed. I can't find when it was added/removed. It's probably fine to try again with the more modern codebase that we have today.

We do have about 50 files per day in https://commons.wikimedia.org/wiki/Category:New_uploads_without_a_license , which are uploaded without a license. The files are tagged by a bot which looks for files not transcluding {{License template tag}} template. A better way would be to prevent users from uploading such files, or guide them through the process of choosing a license template.

Change 309384 had a related patch set uploaded (by Matthias Mullie):
Verify license tags for custom licenses

https://gerrit.wikimedia.org/r/309384

We used to have some form of this, but IIRC it was broken in some way and eventually removed. I can't find when it was added/removed.

For future reference, this was done in ebdc2934701363521c97c6c803b821349a18d7b2 (https://gerrit.wikimedia.org/r/#/c/158589/). The old code checked that the template used is categorized in https://commons.wikimedia.org/wiki/Category:License_tags.

Change 309384 merged by jenkins-bot:
Verify license tags for custom licenses

https://gerrit.wikimedia.org/r/309384

Note that this is not done yet, a small configuration change is required. Without it, the patch only checks that any template at all is used in the custom license (it just looks for '{{' and '}}'), license validation is not enabled by default.

Change 312854 had a related patch set uploaded (by Bartosz Dziewoński):
mw.UploadWizardLicenseInput: Parse license templates as if they were used on a file page

https://gerrit.wikimedia.org/r/312854

I tried to compare the differences in behavior between the old code and the new code. The following templates, which would be accepted per the old rule (categorized in https://commons.wikimedia.org/wiki/Category:License_tags or subcategory), will be rejected per the new rule (do not transclude https://commons.wikimedia.org/wiki/Template:License_template_tag):

  1. Template:ACCovers
  2. Template:AELG
  3. Template:Attribution-TRGov-Military-Coast Guard
  4. Template:BDA-Scan
  5. Template:Biblioteca Museu Víctor Balaguer-cooperation
  6. Template:Cc-non-compliant
  7. Template:CC-self
  8. Template:Cornelius Richter permission
  9. Template:Creative Commons copyright tags
  10. Template:Currency
  11. Template:Fredlyfish4 credit
  12. Template:Gary Lee Todd permission
  13. Template:Heiligenkreuz monastery permission
  14. Template:Heirs-license
  15. Template:Human body diagrams
  16. Template:Häggström diagrams
  17. Template:Images by H&S Medienservice
  18. Template:Images by Svend Buhl
  19. Template:License scope
  20. Template:Lio1962p
  21. Template:MAG permission
  22. Template:MickStephenson2
  23. Template:MikaVäisänen
  24. Template:No rights reserved
  25. Template:NoRightsReserved
  26. Template:NTNU Universitetsbiblioteket
  27. Template:Odia alphabet anime
  28. Template:Oerknor-license
  29. Template:Open Access Media Importer
  30. Template:Patrimoni.gencat permission
  31. Template:PD-Art license tags
  32. Template:PD-art-70-3d
  33. Template:PD-CERN-CMS
  34. Template:PD-Cuba-Other
  35. Template:PD-Cuba-photo
  36. Template:PD-ineligible license tags
  37. Template:PD-IRN
  38. Template:PD-magic
  39. Template:PD-pdsounds.org
  40. Template:PD-Peta
  41. Template:PD-subject
  42. Template:PD-US-1923-abroad
  43. Template:PD-US-statue
  44. Template:Pennsylvania Department of Transportation permission
  45. Template:Permission Arquitectura del Sol
  46. Template:Photos by Bagn Bygdesamling
  47. Template:Photos by Kulturhistorisk museum
  48. Template:PixiUnoCredit1
  49. Template:PossiblyPD
  50. Template:Rcat
  51. Template:Recitation-bot
  52. Template:Riksarkivet (Norway)-media
  53. Template:RodejongCredit
  54. Template:RodejongCredit1
  55. Template:RodejongCredit3
  56. Template:Unicode-expat
  57. Template:Vector-Images.com
  58. Template:Volare
  59. Template:WasMykola
  60. Template:WasMykolaCreate
  61. Template:WasMykolaMap
  62. Template:WEF
  63. Template:XGSC image
  64. Template:Yatrides

In some cases this is okay (e.g. Template:PD-magic is not intended to be a real license template), in some cases it looks like it's not and the template should be fixed (e.g. Template:AELG).

Someone should go through this list before we perform the configuration change to enable this.

(The list was generated with the following script:

.)

Change 312854 merged by jenkins-bot:
mw.UploadWizardLicenseInput: Parse license templates as if they were used on a file page

https://gerrit.wikimedia.org/r/312854

@matmarex I just foud your post from Sep 26 2016, 5:46 PM. The list you produced contains templates which should not be considered as copyright tags, despite how they are categorized. The test for presence of {{License_template_tag}} is used by at least 2 daily queries populating Category:New_uploads_without_a_license and Category:Media_without_a_license:_needs_history_check you can pick daily into the first category to see files that your change would (hopefully) prevent from uploading.

By the way, an alternative to checking for {{License_template_tag}} is to look for machine-readable_markers_to_copyright_templates used by media viewer etc. All files missing them are magically added to Category:Files_with_no_machine-readable_license by some software (not a template) which Commons users have no control over . We do not work with those much since I do not think you can query for them in the database, but looking if template provided has those markers when fully expanded would be another way to tell copyright templates from other templates.

OK, I guess we should go ahead with the configuration change to enable this, then. Thanks.

Change 318518 had a related patch set uploaded (by Bartosz Dziewoński):
Verify license tags for custom license in Commons' UploadWizard

https://gerrit.wikimedia.org/r/318518

Change 318518 merged by jenkins-bot:
Verify license tags for custom license in Commons' UploadWizard

https://gerrit.wikimedia.org/r/318518

Mentioned in SAL (#wikimedia-operations) [2016-11-14T14:40:54Z] <zfilipin@tin> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:318518|Verify license tags for custom license in Commons UploadWizard (T140903)]] (duration: 00m 47s)

matmarex removed a project: Patch-For-Review.

This is live on Commons now.