Commons:Bots/Requests
If you want to run a bot on Commons, you must get permission first. To do so, file a request following the instructions below.
Please read Commons:Bots before making a request for bot permission.
I | Create a user account (while logged in to your normal account) and user page for the bot
On the bot's userpage, add {{Bot}}, which automatically adds the page to Category:Commons bots. Then add the following information to the bot's userpage (all this is mandatory):
|
---|---|
II | Write your program code.
When you put a request at this page, you are expected to be ready for testing. If you are unsure and want to know if your intended bot job will be accepted, please seek community feedback at a suitable venue, e.g. Commons:Village pump. |
III | Create your bot request:
Add your bot request to the list here:
|
IV | Test run
Please make a small test run (5–20 edits) to allow other users to review your bot's tasks. (Please do not put your bot in automatic mode until the request is approved!) |
V | Waiting for approval.
You now need to wait for community approval. A bureaucrat will close the request and will also grant a bot flag, where necessary. Closed requests are moved to Commons:Bots/Archive. |
|
Requests made on this page are automatically transcluded in Commons:Requests and votes for wider comment.
Requests for permission to run a bot
[edit]Before making a bot request, please read the new version of the Commons:Bots page. Read Commons:Bots#Information on bots and make sure you have added the required details to the bot's page. A good example can be found here.
When complete, pages listed here should be archived to Commons:Bots/Archive.
Any user may comment on the merits of the request to run a bot. Please give reasons, as that makes it easier for the closing bureaucrat. Read Commons:Bots before commenting.
Operator: Gzen92 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: File upload from gallica.bnf.fr
Automatic or manually assisted: Automatic.
Edit type (e.g. Continuous, daily, one time run): Occasionally.
Maximum edit rate (e.g. edits per minute): About 20.
Bot flag requested: (Y/N): Nothing.
Programming language(s):
PHP
After Commons:Bots/Requests/Gzen92Bot-4, I continued to upload files from Gallica (Category:Bibliothèque nationale de France), using the parameter "any fayes" = royalty free document (see API), without requesting bot authorization (the infobox is formatted the same).
But there is a problem with categories (see blocking Commons:Village pump#Obtuse bot created categories).
When to create a category? 2, 5, 10 files from the same id?
How to name the category? File name or only the id?
Without being able to answer this question automatically (there are a few million files available at the BnF), I will simply leave them in Category:Images from Gallica and Category:Files from Gallica needing categories (images).
See 10 files uploaded.
Gzen92 (talk) 17:20, 1 November 2024 (UTC)
- Discussion
- Some topical categorization should be determined for every file uploaded, in addition to merely adding a source category.
- Before uploading any new file. A cleanup plan for the thousand of categories created before (Commons:Village pump#Obtuse bot created categories) needs to be found and implemented (by the uploader or somebody else).
∞∞ Enhancing999 (talk) 08:11, 2 November 2024 (UTC)- No problem for this action, if it is the target. Gzen92 (talk) 10:12, 2 November 2024 (UTC)
- Please outline your cleanup plan for the existing categories.
∞∞ Enhancing999 (talk) 11:38, 2 November 2024 (UTC)- I don't think this is the place to talk about this but it is easy to take the files out of the subcategories of Category:Files from Gallica needing categories (images) and put them in Category:Images from Gallica and Category:Files from Gallica needing categories (images).
- The categories can be deleted easily (I don't have the bot permission to do that).
- But I still think it's good to group the files, even if the category name could be improved.
- Gzen92 (talk) 22:00, 2 November 2024 (UTC)
- As a bot operator, I'd expect you to fix problems you created before potentially creating more of them. If you can't present a plan for your past uploads, I don't think this request should be approved. Commons is not a place to dump uncategorized files.
∞∞ Enhancing999 (talk) 10:43, 3 November 2024 (UTC)
- As a bot operator, I'd expect you to fix problems you created before potentially creating more of them. If you can't present a plan for your past uploads, I don't think this request should be approved. Commons is not a place to dump uncategorized files.
- Please outline your cleanup plan for the existing categories.
- No problem for this action, if it is the target. Gzen92 (talk) 10:12, 2 November 2024 (UTC)
- Topical categorisation is not required, see Commons:Guide_to_batch_uploading#Categories. Also, before a "cleanup plan" has to be presented, clear arguments have to be presented as to why those existing categories are a problem. ~TheImaCow (talk) 22:04, 2 November 2024 (UTC)
- Which part are you referring to? Uploaders are required to add categories. Merely adding a user category isn't sufficient.
∞∞ Enhancing999 (talk) 10:43, 3 November 2024 (UTC)- Section "Putting it into action" -> "every file you upload should have: a tracking/source category, Your files can have: topic categories". A source category + a {{Check categories}} (substituted with the "category needed" category here) is sufficient. ~TheImaCow (talk) 13:57, 3 November 2024 (UTC)
- If we agree that both categories are enough, should we create categories to group files or not? For example this category includes a whole book but the name is long. Gzen92 (talk) 21:44, 3 November 2024 (UTC)
- Anyone have an opinion ? With or without category creation ? I would like to continue the job ! Gzen92 (talk) 07:10, 7 November 2024 (UTC)
- If we agree that both categories are enough, should we create categories to group files or not? For example this category includes a whole book but the name is long. Gzen92 (talk) 21:44, 3 November 2024 (UTC)
- Section "Putting it into action" -> "every file you upload should have: a tracking/source category, Your files can have: topic categories". A source category + a {{Check categories}} (substituted with the "category needed" category here) is sufficient. ~TheImaCow (talk) 13:57, 3 November 2024 (UTC)
- Which part are you referring to? Uploaders are required to add categories. Merely adding a user category isn't sufficient.
I don't understand what exactly is requested here. Please say again. --Krd 14:47, 7 November 2024 (UTC)
- I upload files and create categories as soon as there are two files in the Gallica folder. Which creates a lot of categories: Category:Files from Gallica needing categories (images) 65,000 files and 7,200 categories which contain 139,000 files (average 19).
- Remarks on Commons:Village pump#Obtuse bot created categories, I could create categories from 10 files. Gzen92 (talk) 07:43, 8 November 2024 (UTC)
- I still cannot follow. Can you give an example? Krd 08:12, 8 November 2024 (UTC)
- I create a lot of categories with few files (for example Category:Hotel de Roquelaure - Juillet 1906 - photographie - Atget - btv1b10516512m). So a lot of categories in Category:Files from Gallica needing categories (images). Two solutions (I don't know): threshold to create a category (10 files?) or no category at all in Category:Files from Gallica needing categories (images). Gzen92 (talk) 08:22, 8 November 2024 (UTC)
- I think a bot request is not the right place to find a decision how to proceed. Perhaps this is better discussed as com:Village pump. Once there is a decision, we can perhaps discuss what is the best way to clean up, if required. (If I'm mistaken, please advise.) Krd 08:59, 8 November 2024 (UTC)
- I create a lot of categories with few files (for example Category:Hotel de Roquelaure - Juillet 1906 - photographie - Atget - btv1b10516512m). So a lot of categories in Category:Files from Gallica needing categories (images). Two solutions (I don't know): threshold to create a category (10 files?) or no category at all in Category:Files from Gallica needing categories (images). Gzen92 (talk) 08:22, 8 November 2024 (UTC)
- I still cannot follow. Can you give an example? Krd 08:12, 8 November 2024 (UTC)
I am not sure if this is the right forum, but I am wondering how exactly the copyright status of the images is ascertained. For instance, this image has been uploaded as CC0 despite the linked source page containing:
Note(s) : Toute reproduction doit faire l'objet d'une autorisation préalable de(s) auteur(s), de ses (leur) ayants-droits ou de la société qui les représente
Reproduction : Numérisation effectuée par l'auteur d'une sélection de photographies argentiques
which to me sounds as if BNF does not own the right to the image in the first place. Could you elaborate? Felix QW (talk) 13:20, 11 November 2024 (UTC)
- Hello, there is a parameter in the API Gallica "public domain". But you are right, in some html pages it is indicated "specific conditions of use", I know how to exclude them for the future. Gzen92 (talk) 16:29, 11 November 2024 (UTC)
@Krd: After discussions :
Subcategory by year in Category:Files from Gallica needing categories (images), for example Category:Files from Gallica needing categories (images of 1930).
No category for 2 files because often reverse sides (I will manually browse the categories by year to visually identify the reverse sides and move them to Category:Files from Gallica needing categories (images, reverse side).
Category with 3 or more files, also category by year, for example Category:Files from Gallica needing categories (images of 1870).
See 24 edit.
Gzen92 (talk) 18:27, 22 November 2024 (UTC)
- Please comment on the current block of the bot. Krd 05:24, 5 December 2024 (UTC)
- After Commons:Bots/Requests/Gzen92Bot-4, I continued to upload files from Gallica (Category:Bibliothèque nationale de France), without requesting bot authorization (the infobox is formatted the same). There are a lot of files (> 100,000). Blocked to think about a classification solution. I will be more vigilant about the classification and will make a request when other cases of mass loading arise. Gzen92 (talk) 10:09, 5 December 2024 (UTC)
- I'm not sure if I can follow. Does that mean this request is withdrawn until further notice? Krd 10:23, 5 December 2024 (UTC)
- No, this request is valid. There was a discussion about how the bot works. In the future, if I have any doubts about the right way to do this, I will open a request. Please unblock the bot thanks. Gzen92 (talk) 11:11, 5 December 2024 (UTC)
- I'm not sure if I can follow. Does that mean this request is withdrawn until further notice? Krd 10:23, 5 December 2024 (UTC)
- After Commons:Bots/Requests/Gzen92Bot-4, I continued to upload files from Gallica (Category:Bibliothèque nationale de France), without requesting bot authorization (the infobox is formatted the same). There are a lot of files (> 100,000). Blocked to think about a classification solution. I will be more vigilant about the classification and will make a request when other cases of mass loading arise. Gzen92 (talk) 10:09, 5 December 2024 (UTC)
- Ok, so please say in simple language what this job is exactly about. What exactly will be performed now if this request is approved? --Krd 11:51, 5 December 2024 (UTC)
- Import file from gallica.bnf.fr (API).
- Category with BNF number if 3 files or more.
- File/category in categories by date (fewer files at the root).
- At the end, visual search for "white" back faces to isolate them.
- But before starting the import, categorization of existing files/categories in Category:Files from Gallica needing categories (images) by date as indicated above.
- (sorry i use a translator) Gzen92 (talk) 12:56, 5 December 2024 (UTC)
- I'd say please do the next (small) batch and please advise when done. Perhaps it's easier to see the edits. Krd 13:37, 5 December 2024 (UTC)
- Some modifications of files and categories. Gzen92 (talk) 15:37, 5 December 2024 (UTC)
- Please continue slowly. I think this can be approve if no objection arises. Krd 16:15, 5 December 2024 (UTC)
- Some modifications of files and categories. Gzen92 (talk) 15:37, 5 December 2024 (UTC)
- I'd say please do the next (small) batch and please advise when done. Perhaps it's easier to see the edits. Krd 13:37, 5 December 2024 (UTC)
Categories are the least of the problems with that bot. Much more problematic is that the bot systematically inserts false claims of CC0 on almost everything it uploads. Sometimes, it falsely tags CC0 files that are public domain for some other reason, which is already bad behaviour but at least it's not too damaging. Often, it falsely tags CC0 files that are not free, which is much worse and potentially much more damaging. To evaluate the size of the problem so far, searching only for photos by the professional photographer Daniel Cande, it can be found that the bot uploaded thousands of non-free photos by that photographer. It can be expected that looking at more uploads of the bot, more copyvios of non-free works by other authors might be found. Mass uploads should not be initiated without diligently checking the status of the works. After being notified by another user, the operator of the bot opened a deletion request for 107 copyvios. That's ok, but that adresses less than 1% of the thousands of copyvios uploaded by the bot. IMHO, any new uploads by that bot should be entirely disallowed until the operator meets at least minimal conditions: 1) Gives convincing assurances that the bot will stop faking CC0 dedications on works that are not CC0. 2) Reviews all past uploads. 3) Takes quick action to ensure that all copyvios get deleted as soon as possible. 4) Corrects the status tags of files that may be kept. The complete cleanup of all past uploads should be done before new uploads are allowed. -- Asclepias (talk) 16:40, 8 December 2024 (UTC)
- Those are reasonable conditions. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 17:04, 8 December 2024 (UTC)
- Hello, https://api.bnf.fr/api-gallica-de-recherche, parameter "access", "fayes" = public domain. But yes, sometimes it's not the case. I have already modified the code to use the field "dc.rights" ("restricted use (convention BnF-ADM-xxxx-xxxxxx-xx)") in API returns. I can find all the false positives. I will answer on Commons:Help_desk#What_to_do_with_thousands_of_bot_copyvios_?, Gzen92 (talk) 07:25, 9 December 2024 (UTC)
- Bonjour Gzen92, Je place mes quelques commentaires et suggestions dans la boîte ci-dessous parce que c'est un peu long. -- Asclepias (talk) 20:23, 9 December 2024 (UTC)
- Hello, https://api.bnf.fr/api-gallica-de-recherche, parameter "access", "fayes" = public domain. But yes, sometimes it's not the case. I have already modified the code to use the field "dc.rights" ("restricted use (convention BnF-ADM-xxxx-xxxxxx-xx)") in API returns. I can find all the false positives. I will answer on Commons:Help_desk#What_to_do_with_thousands_of_bot_copyvios_?, Gzen92 (talk) 07:25, 9 December 2024 (UTC)
Comments and suggestions in French
|
---|
|
- In short: IMO, a bot should not be authorized to mass upload files unless its operator demonstrates sufficient knowledge of copyright notions and rules and diligence to apply them, especially when the items are from a repository such as Gallica with a complex mixture of free and non-free items. -- Asclepias (talk) 20:23, 9 December 2024 (UTC)
- I totally agree with what others have already said about the licensing issues with the bots uploads. It looks like the bot has been unblocked while still wrongly uploading files as Creative Commons CC0 1.0 when it shouldn't be. Or at least the bot should be adding the appropriate license for the country of origin along with Creative Commons license. Just because the Bibliothèque nationale de France released the images under a Creative Commons CC0 1.0 license on their end doesn't mean the images aren't or wouldn't still be copyrighted in the country of origin and/or shouldn't have the appropriate license for said country. It's not the job of other users to sift through thousands of images after they are uploaded to make sure the licenses are correct. --Adamant1 (talk) 08:52, 12 December 2024 (UTC)
- The bot has not uploaded anything since the unblocking. Gzen92 (talk) 13:56, 12 December 2024 (UTC)
- I'd agree that a bot owner should take care of any mess created by their bot. Krd 10:29, 12 December 2024 (UTC)
- I can do it, I have all the logs of the files created and the data from the Gallica API to make the necessary adjustments. At the moment, I don't know how to solve the problem, since the rights are complex to manage (Gallica's rights are not precise enough). Gzen92 (talk) 13:58, 12 December 2024 (UTC)
- Please consider to provide details somewhere, perhaps in the bot's user space. Krd 14:20, 12 December 2024 (UTC)
- @Gzen92: For those files with insufficient licensing info in the Gallica API, please check the Gallica website and either document the surricient licensing info or request deletion. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 18:52, 12 December 2024 (UTC)
- I can do it, I have all the logs of the files created and the data from the Gallica API to make the necessary adjustments. At the moment, I don't know how to solve the problem, since the rights are complex to manage (Gallica's rights are not precise enough). Gzen92 (talk) 13:58, 12 December 2024 (UTC)
- I totally agree with what others have already said about the licensing issues with the bots uploads. It looks like the bot has been unblocked while still wrongly uploading files as Creative Commons CC0 1.0 when it shouldn't be. Or at least the bot should be adding the appropriate license for the country of origin along with Creative Commons license. Just because the Bibliothèque nationale de France released the images under a Creative Commons CC0 1.0 license on their end doesn't mean the images aren't or wouldn't still be copyrighted in the country of origin and/or shouldn't have the appropriate license for said country. It's not the job of other users to sift through thousands of images after they are uploaded to make sure the licenses are correct. --Adamant1 (talk) 08:52, 12 December 2024 (UTC)
- For information, deleting requested files Commons:Deletion requests/restricted use (convention BnF-ADM-xxxx-xxxxxx-xx) - files. I will make requests for the categories too, apparently it will be less problematic. Gzen92 (talk) 06:55, 16 December 2024 (UTC)
- There are authors which are in the public domain in this DR. Why adding them? Yann (talk) 08:59, 16 December 2024 (UTC)
- All these files are indicated "restricted use (convention BnF-ADM-xxxx-xxxxxx-xx)" by the BnF. I did this automatically. But yes it does work for other people, I thank them. For after, the 2793 categories, I will look at the content one by one. It is often plays, shows or films. It will be easier to know if it is recent or not. Gzen92 (talk) 09:07, 16 December 2024 (UTC)
- Thanks, but this is a real bad way to create a DR. With a mix of PD and non-PD files, deletion can't be automated. The right way is to create a DR by author. This will be also make undeletion easier when the time comes. Yann (talk) 09:27, 16 December 2024 (UTC)
- You are right, i will do better for the categories. Gzen92 (talk) 09:47, 16 December 2024 (UTC)
- Thanks, but this is a real bad way to create a DR. With a mix of PD and non-PD files, deletion can't be automated. The right way is to create a DR by author. This will be also make undeletion easier when the time comes. Yann (talk) 09:27, 16 December 2024 (UTC)
- All these files are indicated "restricted use (convention BnF-ADM-xxxx-xxxxxx-xx)" by the BnF. I did this automatically. But yes it does work for other people, I thank them. For after, the 2793 categories, I will look at the content one by one. It is often plays, shows or films. It will be easier to know if it is recent or not. Gzen92 (talk) 09:07, 16 December 2024 (UTC)
- There are authors which are in the public domain in this DR. Why adding them? Yann (talk) 08:59, 16 December 2024 (UTC)
FlaschBot1 (talk · contribs)
[edit]Operator: Fl.schmitt (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: Add {{Information}} to Media missing infobox template. See exhaustive preparative discussion on Commons:Bots/Work_requests#Media_missing_infobox_template. The bot tries to put as much information as possible into SDC fields (author, source, captions, date), since {{Information}} uses those data as default.
Automatic or manually assisted: Manually assisted. The bot follows "divide and conquer" tactics. Since it seems to be impossible to apply one solutions to > 300,000 media files lacking an infobox template, it will work on sets of files, usually defined by same author / creator (assuming that those files share sufficient similarities). The bot will be run multiple times on that set of files in different modes. First, analyze the file page content and try to categorize each of its components, without modifying and content on Commons. This step will be repeated (manually) as often as needed to adapt the categorization patterns, until a pattern set that fits for all file pages of the current set has been found. Now, a "dry-run" ("simulation") generates an overview over the "planned" modifications (see txt and SQLite analysis and simulation results for Category:Media missing infobox template (maps t1)). Only if this simulation result seems acceptable, the bot will run in "doit" mode to apply the "proposed" edits.
Edit type (e.g. Continuous, daily, one time run): Multiple times a week, but not daily.
Maximum edit rate (e.g. edits per minute): Maybe 5-6 per Minute?
Bot flag requested: (Y/N): Y
Programming language(s): pywikibot
Fl.schmitt (talk) 21:56, 6 September 2024 (UTC)
- Discussion
- Please make test run. --EugeneZelenko (talk) 14:06, 7 September 2024 (UTC)
- Bot ran into abuse filter trying to set SDC date... see Commons_talk:Abuse_filter#Report_by_FlaschBot1. It seems that the bot is currently not allowed to modify SDC data. Here's a first text modification (which is incomplete lacking SDC source and date...): Revision #920662309. I will report back as soon as the abuse filter issue is solved. FlaschBot1 (talk) 17:40, 7 September 2024 (UTC)
- @EugeneZelenko: Hmm, SDC modifications seem to be generally forbidden for new users (and new bots...), according to Special:AbuseFilter/265. Is there a way to get things working nevertheless? FlaschBot1 (talk) 17:46, 7 September 2024 (UTC)
- Would it be ok to use my personal account for the test runs? Fl.schmitt (talk) 11:40, 8 September 2024 (UTC)
- Sorry, took some time for me to understand the problem - bot will reach autoconfirmed status on Sep 10, allowing SDC modifications. Once autoconfirmed, I will continue with test runs. Fl.schmitt (talk) 18:21, 8 September 2024 (UTC)
- Would it be ok to use my personal account for the test runs? Fl.schmitt (talk) 11:40, 8 September 2024 (UTC)
- @EugeneZelenko: test run done: Revision #922463275, Revision #922466275, Revision #922465199, Revision #922464360 and Revision #922461351 (last one - Map of Canton Sankt Gallen.png - has one additional edit by another bot).Map of Canton Schaffhausen.png is an example for adding a caption since a language-specific description template ({{Es}}) was detected. FlaschBot1 (talk) 16:14, 10 September 2024 (UTC)
- Source and author (Artist in original description) are duplicated in {{Information}}. EugeneZelenko (talk) 14:52, 11 September 2024 (UTC)
- @EugeneZelenko: yes, but I think this is better than removing parts of the original description. Removed parts are lost completely, for example using "artist" for the original uploader (which is IMO a nice, sympathetic way of referencing a person). That's an individual trait of the original description that's worth keeping. People shouldn't have to crawl the page history for this. The bot's general approach is to keep the original file description as intact as possible. Its main task is identifying SDC-relevant data, adding SDC and add {{Information}} for a uniform appearance. Keeping the original description is also some sort of "safety net" against the loss of relevant and important information about the file. Fl.schmitt (talk) 15:32, 11 September 2024 (UTC)
- Source and author (Artist in original description) are duplicated in {{Information}}. EugeneZelenko (talk) 14:52, 11 September 2024 (UTC)
- I think this bot is ready for work now. Fl.schmitt (talk) 21:05, 22 September 2024 (UTC)
It sounds to me like mostly manual work, which should not be done with a bot account. Am I mistaken? --Krd 13:27, 26 September 2024 (UTC)
- I don't know if this sort of task should be done by a bot account. I assumed that doing such edits automatically on numerous files using a script is a typical task for a bot.
- Anyway, in my opinion it's mostly bot work, for example automatically checking the initial upload date of a file and using it as the latest value for SDC inception if there's no creation date available in the unstructured file description. That's work for a python script and not for a human being... Additionally, it's not feasibly to edit both SDC and text content of a file page manually, setting multiple SDC values for a single property (e.g. creator). In short: the focus lies on the script-based application of rules that were created "intellectually".
- Of course, there's a "manual" part: checking and adapting the regex rules to identify/categorize the unstructured content, to detect patterns how e.g. creators are mentioned. Additionally, some files may require manual intervention if a rule would only apply to single files. But once the regex rules are defined, they can and should be applied to the complete input set. And that's a bot's task, isn't it? Fl.schmitt (talk) 15:46, 26 September 2024 (UTC)
- You stated above it is working manually assisted, and I'd agree that fully automatically it's not even possible. E.g. in this: Special:Diff/922463275 edit the "source" is still a mess. Even if the bot is flagged, this edit IMO should not be done with a bot flag. (And perhaps it should not be done automatically at all if there is a risk of getting source and author information messed, because it may affect file attribution.) Different opinions welcome. Krd 19:04, 1 October 2024 (UTC)
- Hmm - that "mess" could be done better if there were clear guidelines documented on {{Information}} how to reference WP imports as source. Using {{Source}} seemed the best option to me. If it isn't, I would be glad about advise how to do the "source" statement the correct way in such cases. But maybe it isn't worth the time, I've already wasted a lot of it trying to find a solution that looked viable. Fl.schmitt (talk) 19:27, 1 October 2024 (UTC)
- You stated above it is working manually assisted, and I'd agree that fully automatically it's not even possible. E.g. in this: Special:Diff/922463275 edit the "source" is still a mess. Even if the bot is flagged, this edit IMO should not be done with a bot flag. (And perhaps it should not be done automatically at all if there is a risk of getting source and author information messed, because it may affect file attribution.) Different opinions welcome. Krd 19:04, 1 October 2024 (UTC)
- Yes, this is much needed, please get this bot to work to add the standardized well-established expectable Information template which can be queried or displayed e.g. in apps etc. However, it probably needs many tests and examples to make it add much information to these templates or add categories for manual maintenance where needed. --Prototyperspective (talk) 10:51, 3 October 2024 (UTC)
- I thought the bot would add the Information template to files that lack them. Sorry, I misunderstood. Prototyperspective (talk) 17:43, 11 October 2024 (UTC)
- @Prototyperspective: no, that's exactly the bot's task - adding {{Information}} to file that lack them, but in combination with SDC. Since {{Information}} uses some SDC values as default, there's IMHO no need and no use to save those values in a redundand way woth in SDC and as template parameters. SDC can be queried, too. So, where's the misunderstanding? Fl.schmitt (talk) 19:18, 11 October 2024 (UTC)
- Okay thanks for explaining then it's basically what I thought it was. Such a bot is much needed. Also see Commons:Village pump/Technical#How to search fields of files' Information template? (I think using the insource search operator combined with regex could be the solution to it). Prototyperspective (talk) 21:34, 11 October 2024 (UTC)
- @Prototyperspective: no, that's exactly the bot's task - adding {{Information}} to file that lack them, but in combination with SDC. Since {{Information}} uses some SDC values as default, there's IMHO no need and no use to save those values in a redundand way woth in SDC and as template parameters. SDC can be queried, too. So, where's the misunderstanding? Fl.schmitt (talk) 19:18, 11 October 2024 (UTC)
- I thought the bot would add the Information template to files that lack them. Sorry, I misunderstood. Prototyperspective (talk) 17:43, 11 October 2024 (UTC)
Is the whitespace around the "int:filedesc" intended? This appears uncommon and badly readable to me. --Krd 09:55, 11 October 2024 (UTC)
- Good point - not sure why I've put it there. This is fixed now. Fl.schmitt (talk) 14:34, 11 October 2024 (UTC)
- Well, the outer blanks are common, I'd suggest to remove only the inner ones. Krd 14:38, 11 October 2024 (UTC)
I think the bot could be flagged but should edit without actually setting bot action for an extended slow test run. Do you agree? --Krd 09:55, 11 October 2024 (UTC)
- No problem, that's perfectly fine for me. Let's give it a try. Fl.schmitt (talk) 14:40, 11 October 2024 (UTC)
- Ok, please make a slow start. Krd 14:57, 11 October 2024 (UTC)
- Bot has done six additional edits (combination of SDC and page text update): Revision #938775515 / Revision #938776502 for Map of Canton Thurgau.png, Revision #938777401 / Revision #938778686 for Map of Canton Uri.png and Revision #938779544 / Revision #938780857 for File:Map of Canton Zug.png. It's exactly the same pattern as the previous test runs, except the modified headers. Preview for all pages that are current in category Category:Media missing infobox template (maps t1): simulate_result_61442.txt. That txt file shows the "proposed" SDC values to set (as JSON) and page text content to write (as wikitext). Please keep in mind that the page text (esp. the parameters of {{Information}} is "incomplete" because it relies on the SDC values that are used as defaults. Fl.schmitt (talk) 17:32, 11 October 2024 (UTC)
- Please continue. Krd 05:01, 12 October 2024 (UTC)
- Ok - now the bot is running continuously, but with a minimum delay of five minutes between filepages. So, the bot will visit 12 file pages per hour. Fl.schmitt (talk) 07:05, 12 October 2024 (UTC)
- Update: had to fix an issue with captions - slowed bot down to 4 filepage visits per hour to keep better control. Fl.schmitt (talk) 18:14, 12 October 2024 (UTC)
- Please do not remove the empty line above Categories, like in this edit [1]. This is standard in almost all files and simply reflects what is done and increases readability. --Schlurcher (talk) 08:50, 1 November 2024 (UTC)
- Update: had to fix an issue with captions - slowed bot down to 4 filepage visits per hour to keep better control. Fl.schmitt (talk) 18:14, 12 October 2024 (UTC)
- Ok - now the bot is running continuously, but with a minimum delay of five minutes between filepages. So, the bot will visit 12 file pages per hour. Fl.schmitt (talk) 07:05, 12 October 2024 (UTC)
- Please continue. Krd 05:01, 12 October 2024 (UTC)
- Bot has done six additional edits (combination of SDC and page text update): Revision #938775515 / Revision #938776502 for Map of Canton Thurgau.png, Revision #938777401 / Revision #938778686 for Map of Canton Uri.png and Revision #938779544 / Revision #938780857 for File:Map of Canton Zug.png. It's exactly the same pattern as the previous test runs, except the modified headers. Preview for all pages that are current in category Category:Media missing infobox template (maps t1): simulate_result_61442.txt. That txt file shows the "proposed" SDC values to set (as JSON) and page text content to write (as wikitext). Please keep in mind that the page text (esp. the parameters of {{Information}} is "incomplete" because it relies on the SDC values that are used as defaults. Fl.schmitt (talk) 17:32, 11 October 2024 (UTC)
- Ok, please make a slow start. Krd 14:57, 11 October 2024 (UTC)
- Is the bot still running? Krd 14:44, 7 November 2024 (UTC)
- No, currently not. The first "batch" of files is finished, so I'll have to adapt the regex patterns to the next batch, which will take some time. I'll report back if the bot is ready to run again. Fl.schmitt (talk) 20:01, 7 November 2024 (UTC)
- What is the estimated time frame? Krd 04:00, 20 November 2024 (UTC)
- Maybe mid december? The problem is that I can't simply start running the bot again, but have to adapt it to a new set of files. Anyway, I don't think there's a hurry for the bot's work, since most of the "target" files are lacking {{Information}} since almost 20 years. Fl.schmitt (talk) 13:00, 20 November 2024 (UTC)
- What is the estimated time frame? Krd 04:00, 20 November 2024 (UTC)
- No, currently not. The first "batch" of files is finished, so I'll have to adapt the regex patterns to the next batch, which will take some time. I'll report back if the bot is ready to run again. Fl.schmitt (talk) 20:01, 7 November 2024 (UTC)
Operator: トトト (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: Simple text replacement
Automatic or manually assisted: Automatic manually assisted
Edit type (e.g. Continuous, daily, one time run): One time run
Maximum edit rate (e.g. edits per minute): 6 edits per minute
Bot flag requested: (Y/N): Y
Programming language(s): Python Pywikibot
トトト (talk) 12:40, 30 August 2024 (UTC)
- Discussion
- An example of edits is this. The bot is aimed to assist replacing names of some parameters in the templates {{Japanc}}, {{Japanp}}, {{Aichic}}, etc which I am testing now. Affected files are not too many, but neither few, and by limiting edits within certain categories, it will be unlikely to give harm to unrelated files. --トトト (talk) 12:59, 30 August 2024 (UTC)
- Please make a test run. Krd 02:07, 2 September 2024 (UTC)
- I have tested with my account. And all the necessary edits related to the parameter name changes of {{Japanc}} and {{Japanp}} are completed for now. So I have tried another task today, adding region:JP_scale:5000 to geodata of files from Japan. This was innitially tried from Tototobot (talk · contribs) account, but an error message occured and the task was terminated unexpectedly at PAWS. Perhaps was it because of antispam filter? So I did it on my own account. --トトト (talk) 23:38, 2 September 2024 (UTC)
- Has done some edits successfully. Since it's so far away to compose my own script, right now it does only some replace.py tasks on Pywikibot framework. --トトト (talk) 19:45, 4 September 2024 (UTC)
- I have tested with my account. And all the necessary edits related to the parameter name changes of {{Japanc}} and {{Japanp}} are completed for now. So I have tried another task today, adding region:JP_scale:5000 to geodata of files from Japan. This was innitially tried from Tototobot (talk · contribs) account, but an error message occured and the task was terminated unexpectedly at PAWS. Perhaps was it because of antispam filter? So I did it on my own account. --トトト (talk) 23:38, 2 September 2024 (UTC)
- Please make a test run. Krd 02:07, 2 September 2024 (UTC)
If possible please put into the edit summary not only what is done but also why it is done. Please make another test run. --Krd 13:33, 26 September 2024 (UTC)
- This has nothing to do with this request, but let me allow some testimony here. Recently I have updated templates regarding the monthly rail transport of 47 prefectures of Japan (i.e. {{Railtransportmonth-KanagawaPref}} 1). One of the purposes of these edits is to add a new category Category:(month name) in rail transport in xxxx prefecture to every page using these templates. However, the wiki system didn't make this revision effective unless there are some kind of edits to these pages. It means that I had to delete the last indention from all these pages manually. So at first I thought, well, then how about using pywiki-bot for these tasks? I tried replay.py replacing
}}\s
to}}
for these pages (i.e this page). This is a simple enough task for a novice code learner. But the sad thing is, pywiki-bot said that such a task is unnecessary, and didn't execute anything. Certainly it might be a meaningless edit from the point of view of contents-creation, but I shall say it is necessary for page maintenance, i.e., in order to activate the latest revision of templates. So I ended up using AWB executing simple replacement tasks for all the pages. By pushing "Save" button for thousand time or so, these pages are to be re-defined. Letting pywiki-bot add a user-template-generated harmless characters such as<!-- -->
to these pages is another idea, but I didn't like it because it polutes the edit history of these pages. So this is it. It will be good if my bot can execute auxiliary tasks to my manual edits, but since I am a novice in this field,currently I have no detailed plans how to use it.Thank you for reading this. --トトト (talk)19:10, 26 September 2024 (UTC)--トトト (talk) 19:24, 26 September 2024 (UTC)
- I have added explanation to User:Tototobot. Removing
{{Geogroup}}
from template-generated categories is an important task, especially after{{Geogroup}}
is being implemented in the template itself. I will also try to improve the description of edit summaries. --トトト (talk) 22:27, 26 September 2024 (UTC)- Sadly I cannot follow your previous answer. Please specify in more detail what are the intended tasks.
- Also, while Commons is multilingual, it appears as no good idea to me to edit Japanese related content with Dutch language edit summaries. Can this be improved? Krd 10:00, 11 October 2024 (UTC)
- I have added explanation to User:Tototobot. Removing
- This has nothing to do with this request, but let me allow some testimony here. Recently I have updated templates regarding the monthly rail transport of 47 prefectures of Japan (i.e. {{Railtransportmonth-KanagawaPref}} 1). One of the purposes of these edits is to add a new category Category:(month name) in rail transport in xxxx prefecture to every page using these templates. However, the wiki system didn't make this revision effective unless there are some kind of edits to these pages. It means that I had to delete the last indention from all these pages manually. So at first I thought, well, then how about using pywiki-bot for these tasks? I tried replay.py replacing
- Taking this category for example. While
{{prefjp/name}}Prefmonth
is seen, it is actually using {{NaganoPrefmonth}}. And before{{Geogroup}}
was added in the template itself in this edit, {{Geogroup}} was sporadically used when a category was created by a volunteer user, which means some Nagano prefecture by month categories are with{{Geogroup}}
, but others are not. So, after {{NaganoPrefmonth}} is revised, making a bot remove those manually inserted{{Geogroup}}
from categories is necessary. Under the condition that 50 or more such categories randamly existing, removing{{Geogroup}}
one by one manually is simply a waste of time. Am I wrong? --トトト (talk) 17:58, 11 October 2024 (UTC) - I will try to improve edit summaries. But generally you cannot force anybody to use English in commons, can't you? Please also understand that up until recently, Dutch was the second language for most of Japanese people. --トトト (talk) 18:11, 11 October 2024 (UTC)
- I didn't know that Dutch is the second language for most of Japanese people, but does it make sense to use a second language? --Krd 07:53, 1 November 2024 (UTC)
- Taking this category for example. While
@トトト: Please report current status. --Krd 03:59, 20 November 2024 (UTC)
- Did some tasks with the command
pwb.py replace
. I am getting used to this command. I have also triedpwb.py category remove
on PAW but failed for some reason, so I'm trying to figure out why. --トトト (talk) 09:37, 21 November 2024 (UTC)
- Did some tasks with the command
@トトト: Please report current status. Please wait for approval before continuing at full speed. --Krd 06:49, 14 December 2024 (UTC)
- Has also successfully executed some tasks using
pwb.py category remove
command. Edits with the sammaries-(category name), ∵{{com:Overcat}}
are done with this command. Other edits like this are done withpwb.py replace
. Some edits do not match with the edit summaries, this is because I tried multiple replacements at the same time. I should have been more careful. But the all of the edits have been basically successful, and need no reverts as of now, I believe.15:21, 14 December 2024 (UTC)--トトト (talk) 15:36, 14 December 2024 (UTC)
- Has also successfully executed some tasks using
GogologoBot (talk · contribs)
[edit]Operator: MFossati (WMF) (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: add the following structured data statement and qualifier to the file page of a new upload that is detected as a logo by this tool.
- statement: instance of (P31) logo (Q1886349)
- qualifier: determination method or standard (P459) machine learning (Q2539)
Automatic or manually assisted: automatic, supervised
Edit type (e.g. Continuous, daily, one time run): continuous
Maximum edit rate (e.g. edits per minute): it depends on the amount of image uploads and on the amount of images detected as a logo. Hard to tell for now 2 edits per minute. Please note that this is an anecdotal estimate based on test edits. See also #c-MFossati_(WMF)-20241015104800-Krd-20241011094700
Bot flag requested: (Y/N): Y
Programming language(s): Python, Pywikibot
Source code: https://gitlab.wikimedia.org/toolforge-repos/gogologo
MFossati (WMF) (talk) 12:19, 24 July 2024 (UTC)
- Discussion
- I think it'll much better application for bot it it could detect non-trivial logos or logos already deleted. --EugeneZelenko (talk) 14:41, 24 July 2024 (UTC)
- Wouldn't it be better to add them with a separate property? While I'm in favor of adding more such ways to identify images, I don't think it mixes well with other statements. This was attempted and finally discarded with "depicts" statement a while back. Please make sure these statements can also be searched with Special:Search. Enhancing999 (talk) 14:53, 1 August 2024 (UTC)
- Hey Enhancing999, thanks for your comment. Could you please provide any specific pointers to the previous attempt you mentioned? MFossati (WMF) (talk) 11:29, 12 August 2024 (UTC)
- Is this bot going to be used as "act once on new uploads", "act once on all existing files", "potentially act more than once on the same file", or what? Unless it only acts exactly once on any given file, what is to prevent it getting into an edit war if its edit is reverted or otherwise changed? - Jmabel ! talk 18:11, 1 August 2024 (UTC)
- Hi Jmabel, thanks for your question. The bot is expected to act once on new uploads. MFossati (WMF) (talk) 11:31, 12 August 2024 (UTC)
- Good. Is there any chance that the bot could also look at the wikitext for {{Own work}} and add a maintenance category (call it Category:Own work logo to checked) if it appears to be a logo and is claimed as "own work"? We see that combination a lot, and it is almost never true. And possibly something similar for a logo + any CC license, because that's usually false as well: we very rarely get a license for any logo that is above the threshold of originality. - Jmabel ! talk 15:15, 12 August 2024 (UTC)
- I agree that the ability to search for logos plus own work and/or CC licenses would make a lot of sense. I think this is something we can do by querying structured data. For instance, we can already run a query like this to look for own work files with CC BY-SA 4.0. As soon as the proposed logo statements get added, we can then insert a
wdt:P31 wd:Q1886349
constraint in the query. MFossati (WMF) (talk) 09:50, 14 August 2024 (UTC)
- I agree that the ability to search for logos plus own work and/or CC licenses would make a lot of sense. I think this is something we can do by querying structured data. For instance, we can already run a query like this to look for own work files with CC BY-SA 4.0. As soon as the proposed logo statements get added, we can then insert a
- Good. Is there any chance that the bot could also look at the wikitext for {{Own work}} and add a maintenance category (call it Category:Own work logo to checked) if it appears to be a logo and is claimed as "own work"? We see that combination a lot, and it is almost never true. And possibly something similar for a logo + any CC license, because that's usually false as well: we very rarely get a license for any logo that is above the threshold of originality. - Jmabel ! talk 15:15, 12 August 2024 (UTC)
- Hi Jmabel, thanks for your question. The bot is expected to act once on new uploads. MFossati (WMF) (talk) 11:31, 12 August 2024 (UTC)
- Comment As requested by the rules, we've test-run the bot on 100 uploads randomly sampled from uploads made between Aug 21 and today, and here are the results:
- 4 medias were deleted beforehand, so no edit
- 1 media was skipped (maximum retries attempted due to maxlag without success), so no edit
- 95 medias were successfully edited
- It seems that it successfully worked, but we'll wait for community review. Sannita (WMF) (talk) 15:34, 30 August 2024 (UTC)
- It appears each file is edited twice. Is that for technical reason, or can the edits be combined in any way? Krd 17:36, 30 August 2024 (UTC)
- Great point, Krd! It made me realize that the current code first adds the claim, then adds the qualifier, thus producing two edits. I've just tried that we can do the other way around. So - yes - we can indeed combine them into a single edit. I've updated the code accordingly. Thanks a lot, this is really helpful. MFossati (WMF) (talk) 14:16, 9 September 2024 (UTC)
- Can you use another property than P31 as suggested above? I think we should avoid a re-run of c-a t where WMF mostly ignored community input.
∞∞ Enhancing999 (talk) 17:52, 30 August 2024 (UTC)- Hi @Krd and @Enhancing999, thanks for your feedback and sorry for the late reply, for some reason your replies did not appear in my notifications.
- While we wait for @MFossati (WMF) to be back in office for answering the first question, we are open to suggestion as to which property to use. @Enhancing999 do you already have one in mind? Sannita (WMF) (talk) 16:11, 5 September 2024 (UTC)
- You can create one ad hoc.
∞∞ Enhancing999 (talk) 17:16, 5 September 2024 (UTC)- @Enhancing999 Sorry for the long answer, but I felt the need to clarify some things about the request.
- We need to start somewhere to see if the experiment is of some value to the moderators. This is an experiment within the first quarter OKR work for FY24/25 (WE2.3.1). We don't think a new property would work, especially because the property proposal request would likely be considered too specific in scope to be accepted by the Wikidata community, not without reasons.
- We can quickly and easily use an existing property, and see if it’s valuable. If not, we will rollback as quickly and easily. The property instance of (P31) seems like the best fit, because we think it’s specific and meaningful. More importantly, the property is indexed, thus enabling search queries both in Special:Search and in Special:MediaSearch. Furthermore, qualifiers are also indexed, so it will be possible for moderators to find media classified as a logo by this bot. You can either use a search query (example with Special:Search, example with Special:MediaSearch) or a SPARQL one to achieve it.
- If detecting and tagging incoming logos does not help with easier logo moderation, then our plan is to rollback our own edits at the end of the experiment. If it does help, then we’re planning to investigate other ways to store and query such data, as we are considering other experiments in the near future as suggested by the community. Sannita (WMF) (talk) 15:09, 9 September 2024 (UTC)
- Wikidata easily creates properties that are just meant to be used for Commons. This shouldn't take much time and compared to working speed of WMF (It's seven weeks since you asked for input), this shouldn't be an issue. Nothing prevents you for indexing this property as well.
- If you think a separate property wont work, it means that ultimately this wouldn't work using instance of (P31) either. I think such implementations need more attention than once every month.
- Given the massive community backlash WMF got from an ill-prepared, hastily implement, not community feedback driven, likely costly previous experiment mixing machine contribution with our highly valued volunteer contributors, I think it's good to take good care this time, especially as a simple way was suggested already seven weeks ago.
∞∞ Enhancing999 (talk) 15:43, 9 September 2024 (UTC)- @Enhancing999: unless there are a lot of false positives (and I don't think there are), the tagging of these as instance of (P31) : logo (Q1886349) seems at worst harmless. What would be the advantage of a distinct property? - Jmabel ! talk 04:45, 10 September 2024 (UTC)
- There are likely few false positive in the first test set as it's still followed, but last time, it became problematic when person at WMF developing it moved on to something else.
- Based on past experience, I guess you know what happens afterwards: you will have to wait 7 weeks for an acknowledgment, then you will be told to ask for a change in the next wishlist, and, even if everybody agrees with it, you will have to wait for the next annual plan to have it scheduled. Possibly somebody will then throw it out entirely, because they don't know how to fix it.
- In any case, the idea is to classify also images where there is a lower confidence in the automatism so review is necessary.
- Using two different properties allows users to easily switch between volunteer assessment and machine assessment, focus on volunteer assessment while excluding machine assessment if they happen to agree.
∞∞ Enhancing999 (talk) 11:12, 13 September 2024 (UTC)
- @Enhancing999: unless there are a lot of false positives (and I don't think there are), the tagging of these as instance of (P31) : logo (Q1886349) seems at worst harmless. What would be the advantage of a distinct property? - Jmabel ! talk 04:45, 10 September 2024 (UTC)
- You can create one ad hoc.
- It appears each file is edited twice. Is that for technical reason, or can the edits be combined in any way? Krd 17:36, 30 August 2024 (UTC)
- Is a coat of arms or a military unit insignia or a sports uniform a logo per the definition a "logo"? --Krd 07:29, 13 September 2024 (UTC)
- @Krd: we're targeting images similar to Category:Logos, thus making a distinction between other classes such as Category:Coats_of_arms or Category:Sports_kit_templates. MFossati (WMF) (talk) 13:45, 13 September 2024 (UTC)
- In my personal opinion there are too many false positives. Krd 13:52, 13 September 2024 (UTC)
- Special:Permalink/923690458 has a gallery of images edited by the bot. Personally, I don't think false positives are an issue as such, at least when they are clearly distinguished from manual edits (see separate property above).
∞∞ Enhancing999 (talk) 14:08, 13 September 2024 (UTC)- I agree that most of them are some kind of symbols or graphics, but I'd guess a third of them would not be put under Category:Logos, so "instance of logo" doesn't make much sense then. Am I mistaken? Krd 14:16, 13 September 2024 (UTC)
- It really depends what the logo people want to do with it. Today it's "logos", but it could be just any image type or topic. The confidence level of the classification can also evolve or be changed.
∞∞ Enhancing999 (talk) 14:29, 13 September 2024 (UTC) - Come to think of it, maybe the statement with the new property should be qualified with the confidence level (for the classification of the image) and the program version being used (if not available, the current date).
∞∞ Enhancing999 (talk) 09:08, 17 September 2024 (UTC)- That makes sense to me. Krd 11:46, 17 September 2024 (UTC)
- You read my mind: this is definitely something I wanted to propose. MFossati (WMF) (talk) 13:53, 17 September 2024 (UTC)
- I've just realized that the bot has made accidental edits that weren't meant to be there, sorry for that! I've manually reverted them. Please refer to the test edits. MFossati (WMF) (talk) 13:58, 17 September 2024 (UTC)
- Similar to Special:Permalink/923690458 can you do a gallery that shows all images that you consider valid test cases (ideally include the confidence level for the classification as a legend).
∞∞ Enhancing999 (talk) 09:21, 21 September 2024 (UTC)
- Similar to Special:Permalink/923690458 can you do a gallery that shows all images that you consider valid test cases (ideally include the confidence level for the classification as a legend).
- It really depends what the logo people want to do with it. Today it's "logos", but it could be just any image type or topic. The confidence level of the classification can also evolve or be changed.
- The (determination method or standard (P459), machine learning (Q2539)) qualifier distinguishes the bot's edits from manual ones. The queries mentioned here retrieve the bot's ones. You can compare these two queries: with qualifier (bot's edits) VS no qualifier (non-bot's edits). As a side note, nothing prevent us from trivially looking at the bot's contributions, too. MFossati (WMF) (talk) 13:46, 17 September 2024 (UTC)
- You can't distinguish them any more when someone thinks they are correct and also adds a P31. Or would they have to remove the qualifiers? And no, looking at individual files and/or edits is definitely not a solution. Please make sure the results can be view by querying both with search and on SDC portal (hopefully eventually open).
∞∞ Enhancing999 (talk) 09:16, 21 September 2024 (UTC)
- You can't distinguish them any more when someone thinks they are correct and also adds a P31. Or would they have to remove the qualifiers? And no, looking at individual files and/or edits is definitely not a solution. Please make sure the results can be view by querying both with search and on SDC portal (hopefully eventually open).
- We could also create a Wikidata item for this bot and use it as the qualifier value, instead of machine learning (Q2539). MFossati (WMF) (talk) 14:01, 17 September 2024 (UTC)
- It's better to use a separate property and qualify that with the program version being used. A year or two later, one will otherwise have a hard time which version of the bot considered what by which threshold. I suggest we create to properties:
- "Commons machine image type"
- "Commons machine image subject"
- The second for later uses, if you wan't to try to determine a logo topic.
∞∞ Enhancing999 (talk) 09:20, 21 September 2024 (UTC)
- It's better to use a separate property and qualify that with the program version being used. A year or two later, one will otherwise have a hard time which version of the bot considered what by which threshold. I suggest we create to properties:
- I agree that most of them are some kind of symbols or graphics, but I'd guess a third of them would not be put under Category:Logos, so "instance of logo" doesn't make much sense then. Am I mistaken? Krd 14:16, 13 September 2024 (UTC)
- Special:Permalink/923690458 has a gallery of images edited by the bot. Personally, I don't think false positives are an issue as such, at least when they are clearly distinguished from manual edits (see separate property above).
- In my personal opinion there are too many false positives. Krd 13:52, 13 September 2024 (UTC)
- @Krd: we're targeting images similar to Category:Logos, thus making a distinction between other classes such as Category:Coats_of_arms or Category:Sports_kit_templates. MFossati (WMF) (talk) 13:45, 13 September 2024 (UTC)
- I'd like to highlight the main goal of this experimental bot, namely to help moderators find potentially problematic media. MFossati (WMF) (talk) 14:06, 17 September 2024 (UTC)
- What is the commitment of WMF to maintain this going forward? How much time will you spend maintaining it in the next months each week? Or will it be discontinued after a month?
∞∞ Enhancing999 (talk) 09:23, 21 September 2024 (UTC)- We are committed to maintaining this bot for as long as it needs to be. As already mentioned, this is one of our priorities for the year, and definitely won’t be dropped after one month. On the other hand - after careful consideration with the team - we won’t be pursuing the path of creating new Wikidata properties, nor adding the confidence score as structured data, as part of this work of identifying and providing a way for easier moderation of logos.
- While we agree that probabilistic statements supported by confidence scores are a very relevant topic, to the best of our knowledge no available Wikidata property can express so yet, and we see the need for a cross-community broad discussion that is outside of this experiment’s scope. If no consensus is reached on this bot request, the alternative is that we periodically release lists of potential logos to be considered (this time with confidence score), like we recently did. Sannita (WMF) (talk) 13:42, 25 September 2024 (UTC)
- Creating appropriate properties is a fairly straightforward process. As you seem to have some issues with having these created for that data, I think the dataset approach is preferable.
- It also wont leave us with data the community needs to clean up next year, once the experiment has ended, as last time.
∞∞ Enhancing999 (talk) 12:20, 29 September 2024 (UTC)
- Do we have moderators who use the output of the bot for anything? I think it hasn't been outlined above, so I'm still trying without offense to understand who is in need of that, or if it may be a solution looking for a problem. Krd 13:36, 26 September 2024 (UTC)
- Hi @Krd, sorry for the late reply, but notifications aren't working on this page for some reason. As of now, as far as we know no one is using the output of the bot, mostly because it has not been approved yet and we are waiting on approval to fully resume our work. But the reactions on the admins' noticeboard to our dataset about potential logos seems to show that our work is effectively useful to identify potentially problematic logos, and can let admins and moderators focus on a narrower set of images, instead of relying on reports on last uploads. Plus, as we already stated, this proposal comes from several discussions and user interviews we had in the past months with the community where the need for machine detection tools was raised, so it is a solution to a problem that the community raised. Sannita (WMF) (talk) 13:54, 3 October 2024 (UTC)
- I think users would prefer categories, but dataset seem to work too. Given the problems with getting the statements right and the closed nature of SDC statements, I think it's preferable to pursue the two other ways (dataset, categories).
∞∞ Enhancing999 (talk) 16:56, 9 October 2024 (UTC)
- I think users would prefer categories, but dataset seem to work too. Given the problems with getting the statements right and the closed nature of SDC statements, I think it's preferable to pursue the two other ways (dataset, categories).
- Hi @Krd, sorry for the late reply, but notifications aren't working on this page for some reason. As of now, as far as we know no one is using the output of the bot, mostly because it has not been approved yet and we are waiting on approval to fully resume our work. But the reactions on the admins' noticeboard to our dataset about potential logos seems to show that our work is effectively useful to identify potentially problematic logos, and can let admins and moderators focus on a narrower set of images, instead of relying on reports on last uploads. Plus, as we already stated, this proposal comes from several discussions and user interviews we had in the past months with the community where the need for machine detection tools was raised, so it is a solution to a problem that the community raised. Sannita (WMF) (talk) 13:54, 3 October 2024 (UTC)
- What is the commitment of WMF to maintain this going forward? How much time will you spend maintaining it in the next months each week? Or will it be discontinued after a month?
- Looks like a great way to reduce mod/admin maintenance workload and reduce the number of copyvios on WMC. Please extend it or create similar bots to also detect other copyvios as proposed here or similar. Thanks for developing it, it seems very useful! --Prototyperspective (talk) 11:37, 21 September 2024 (UTC)
- Hi @Prototyperspective, thanks for your positive feedback and suggestions (and also sorry for the late reply, but notifications aren't working properly on this page). We do think this work is valuable and we heard good feedback. Before engaging in new bot requests, though, we think it would be better to close the current request with enough consensus to go on. We are going to do another experiment for automated detection in the next months as part of our planned work, but we also don't want to operate without or against community consensus. For now, as the current bot request has not been approved, you can access logos identified by the logo detection through exported datasets. Sannita (WMF) (talk) 13:56, 3 October 2024 (UTC)
@MFossati (WMF): You didn't specify the edit rate in the request. What do you expect the daily edit count to be, and how many edits will be required to classify existing files? Does it make sense to start with one edit per minute for an extended test run? --Krd 09:47, 11 October 2024 (UTC)
What do you expect the daily edit count to be
- The test edits indicate 1 edit every 33 seconds on average, with an estimated daily count of 2,600 circa.
how many edits will be required to classify existing files
- Please note that the currently requested task only accounts for new uploads. We may consider scaling the bot up to existing files in a subsequent request, if that's useful for the community (broader discussion needed). However, we can't compute the total amount of edits beforehand, because we'll have to run (or dry-run at least) the classifier over existing files first.
Does it make sense to start with one edit per minute for an extended test run?
- I think it would be reasonable to stick to the average edit rate to do so, i.e., 2 edits per minute. MFossati (WMF) (talk) 10:48, 15 October 2024 (UTC)
- Please put it live as suggested, as extended test run. Krd 10:56, 15 October 2024 (UTC)
- Done The bot is running and will edit 3,833 recent uploads, detected as logos between Aug 21 and today. See also the initial test. MFossati (WMF) (talk) 15:02, 16 October 2024 (UTC)
- Please put it live as suggested, as extended test run. Krd 10:56, 15 October 2024 (UTC)
Please do not manually edit with the bot account. --Krd 07:18, 20 October 2024 (UTC)
- I addressed a user request and also noticed incidental duplicate statements, so I took care of them. The affected pages were edited through pywikibot code, so no manual edits happened. Not sure why those edits got the manual revert tag, though. MFossati (WMF) (talk) 09:49, 21 October 2024 (UTC)
- Special:Diff/945223918 is a manual edit, isn't it? Krd 10:15, 21 October 2024 (UTC)
- That wasn't intended: I forgot to logout from the bot account and login to my user one before signing that edit. I've fixed the signature one minute later, see Special:Diff/945224965. MFossati (WMF) (talk) 13:37, 21 October 2024 (UTC)
- Special:Diff/945223918 is a manual edit, isn't it? Krd 10:15, 21 October 2024 (UTC)
- The extended run of the bot has finished, ~3,800 images have been edited and identified as a logo, you can take a look at the bot's contributions to evaluate them. What do you think of the results? Do you think the bot could be now allowed to run? Sannita (WMF) (talk) 16:56, 24 October 2024 (UTC)
- First thought is that all of the "BSicon" stuff is wrong, none of those are logos. - Jmabel ! talk 19:41, 24 October 2024 (UTC)
- Picking 10 others at random:
- File:MEFA Logo.svg already appropriately categorized as a logo, clearly below TOO.
- File:Bridger Aerospace logo.svg Like the previous one. I did add Category:Logos of companies of the United States because uploader hadn't thought to say what country.
- File:Club América de Palpa.png might be problematic; claimed as "own work" (which might be true for the particular PNG); roughly in the neighborhood of TOO, I don't know rules for Peru, someone might want to take a look at this. Needs categories, in any case.
- File:Avelonia Logo 2024.png similar case to the last, although in this case I can't even quickly tell what country.
- File:2gotravellogo.png needs categories, from Philippines, appears to be below TOO.
- File:CTree RootNet Bubble blue rev 05 interim 600dpi.png. No idea what this is, needs cats, might be a logo or not, doesn't say what country, below TOO almost anywhere.
- File:BR Verkehr Variante.svg appropriately categorized, clearly below TOO, unproblematic.
- File:Auliq Records wordmark.png clearly below TOO, heavily but poorly categorized, already nominated for deletion as out of scope (on which I have no opinion).
- File:Filebrowser - banner.svg already correctly identified as a free software logo, the claimed license is indeed granted at specified source (but might be below TOO anyway).
- File:ALoSeguro.svg not clearly a logo (just text form of a slogan) though categorized as such, clearly below TOO, appropriately categorized.
- So the usefulness of this is supposed to be …? - Jmabel ! talk 20:04, 24 October 2024 (UTC)
- Thanks for your proactive feedback! I think it really follows the direction of our long-term goal, namely to help moderators find problematic media.
- MFossati (WMF) (talk) 14:15, 25 October 2024 (UTC)
- @MFossati (WMF): I'm an admin, and based on my sample above, I don't see how this is any more help in finding problematic media than looking at any random collection of newly uploaded files. Can you clarify how this is supposed to be more useful in finding problematic files than just a random selection of recent uploads? - Jmabel ! talk 18:45, 25 October 2024 (UTC)
- @Jmabel Thanks for your feedback. Marco is currently out of office, so I'll try to take a stab at it: we're trying to narrow down the number of potentially problematic files, by highlighting only the potential logos among the quite high number of uploads that everyday are made on Commons.
- We received already some positive feedback about our work in the last six months, and according to our findings our experimental tool can detect ~47% more files that are not correctly categorized as logos or are not correctly pointed out as such. This does not substitute the need for a human eye to evaluate them, but at least should be enough to help said human eye to find what they need to find, without resorting to check a "random collection of newly uploaded files". This is also what we're trying to do with our datasets published on the Admins' noticeboard.
- Anyway, if this bot request is just not enough useful, we'll stick to publishing datasets for you admins to consider. Sannita (WMF) (talk) 19:02, 29 October 2024 (UTC)
- Is there any "control" for this experiment? E.g. any pre-bot baseline on some task or tasks that we can measure against to see whether the bot is actually helping anything happen any better than before? - Jmabel ! talk 21:18, 29 October 2024 (UTC)
- @MFossati (WMF): ? --Krd 05:21, 5 December 2024 (UTC)
- Given a monthly dataset such as November 2024, we compute the gain as follows: , where is the total amount of input images detected as a logo, is the amount of input images that were deleted, is the amount of input images with Template:PD-textlogo, and is the amount of input images with at least one category that has an occurrence of
logo
. We consider to be the human curation baseline that matches uploads detected as a logo. represents the percentage of potential logos that hasn't undergone human curation. For instance, November 2024 has . MFossati (WMF) (talk) 15:53, 6 December 2024 (UTC)
- @MFossati (WMF): I'm an admin, and based on my sample above, I don't see how this is any more help in finding problematic media than looking at any random collection of newly uploaded files. Can you clarify how this is supposed to be more useful in finding problematic files than just a random selection of recent uploads? - Jmabel ! talk 18:45, 25 October 2024 (UTC)
Please summarize. Have all objections been resolved? What is the conclusion? --Krd 06:52, 14 December 2024 (UTC)