WebCiteBOT
Archive: 2009-2011
What's up with this bot?
editAccording to Special:Contributions/WebCiteBOT has not been working since 2009-11-26. Why isn't this bot active, since I think it is a very much needed bot? Toshio Yamaguchi (talk) 07:44, 2 February 2011 (UTC)
- User:Hydroxonium already invited me to participate in that task force and I have already expressed my thoughts on the projects importance there. Nevertheless thank you for pointing me to that page. Its much appreciated. Toshio Yamaguchi (talk) 13:34, 5 February 2011 (UTC)
I have returned to Wikipedia after 2 years off. If someone could kindly update me as to whether this bot is still needed or has been replaced, I would appreciate it. --ThaddeusB (talk) 04:48, 14 February 2012 (UTC)
- Should check into that SoC archiving extension; AFAIK it is not used anywhere (much less En) and so the bot is still needed. --Gwern (contribs) 04:49 14 February 2012 (GMT)
- Thanks for quick reply. Looks like mw:Extension:ArchiveLinks is not yet operational, and possibly not being worked on anymore. I found some discussion on enwiki from January of this year indicating that no progress had been made on either a WebCiteBOT2 or the ArchiveLinks extension. As such, I will try to get my bot relaunched within the next week. --ThaddeusB (talk) 05:44, 14 February 2012 (UTC)
- There was also this BRFA, which however expired. Toshio Yamaguchi (talk) 08:03, 14 February 2012 (UTC)
I've restarted to additions monitor (still works) and emptied out the pending database (no point trying to archive links that were added a year ago). It will be 48 hours before I can test the editing part of the bot since there is a 48 hour delay between adding and archiving as per the original BRFA. I also updated the stats page. --ThaddeusB (talk) 03:51, 15 February 2012 (UTC)
- As promised, I have fixed up the code to deal with the (minor) changes in the Wikipedia API/WebCite that have taken place over the lats 2 years. It made its first test edits today (limited to 10 links) which were successful (log). A couple related links failed because of a problem on WebCite's end, which I will pass on to them. A more extensive test will take place tomorrow. --ThaddeusB (talk) 04:59, 17 February 2012 (UTC)
- A larger test (30 links) was done this afternoon. It revealed a few minor bugs that I've now fixed. Next test will start shortly. --ThaddeusB (talk) 01:12, 18 February 2012 (UTC)
- Test three (60 links) completed. One minor bug was found and corrected. Behavior changed so that pages with "season" in title do not have links like [1] in text changed to refs. Normally these links are poorly formatted refs, but in sports articles they are sometimes links to game recaps found in results tables. Adding an exclusion for "season" articles should make teh bot avoid this (slightly) undesirably change. --ThaddeusB (talk) 23:52, 18 February 2012 (UTC)
- Two more tests (70 total links) completed. (A smaller one to make sure changes I made didn't break anything and then a larger one). No bugs found. --ThaddeusB (talk) 00:32, 20 February 2012 (UTC)
- The biggest test to date (100 links) was performed. A small issue was found - the IRC link reporting bot unencodes URL encoded links for some reason, which means my bot reports them as removed when it checks the wikitext. A work around would be difficult, so for now I'll just let the bot skip those pages. (Looks like ~0.2-0.5% of all links are effected by the bug.) No other problems. --ThaddeusB (talk) 03:39, 21 February 2012 (UTC)
- Two more tests (70 total links) completed. (A smaller one to make sure changes I made didn't break anything and then a larger one). No bugs found. --ThaddeusB (talk) 00:32, 20 February 2012 (UTC)
- Test three (60 links) completed. One minor bug was found and corrected. Behavior changed so that pages with "season" in title do not have links like [1] in text changed to refs. Normally these links are poorly formatted refs, but in sports articles they are sometimes links to game recaps found in results tables. Adding an exclusion for "season" articles should make teh bot avoid this (slightly) undesirably change. --ThaddeusB (talk) 23:52, 18 February 2012 (UTC)
- A larger test (30 links) was done this afternoon. It revealed a few minor bugs that I've now fixed. Next test will start shortly. --ThaddeusB (talk) 01:12, 18 February 2012 (UTC)
I am curious as to how this bot will react to the presence of the {{Query web archive}} template in a citation. Can a test be performed soon? – Allen4names 09:34, 22 February 2012 (UTC)
- I was unaware of the existence of this template until just now, so thanks for bringing it to my attention... Whenever possible, the bot creates an archive specific to the date (or technically slightly after) when the reference was added to the article, whereas the template just allows people to search to see if one was made. As such, I see two reasonable alternatives: 1) do nothing if the template is present. 2) behave as normal & remove the template if a successful archive is made.
- What are your thoughts on the bets way to proceed? --ThaddeusB (talk) 18:45, 22 February 2012 (UTC)
- I would prefer option 1. The template links to the Wayback Machine and Wikiwix archive services and WebCite has been down a few times. If in is unnecessary to make a change why do so. – Allen4names 20:47, 22 February 2012 (UTC)
- Well, the two aren't quote the same thing. The {{Query web archive}} is saying "if this link is dead, try theses" whereas the archive link inside the {{cite xxx}} is (in theory) saying "this is what the page looked like when used as reference for WP:V purposes." In any case, it is a very rare situation. I have made the change to simply skip these cases, as you suggested, and tested it locally on sample text. --ThaddeusB (talk) 22:24, 23 February 2012 (UTC)
- Okay with me. Thank you for your time. – Allen4names 04:43, 24 February 2012 (UTC)
- Well, the two aren't quote the same thing. The {{Query web archive}} is saying "if this link is dead, try theses" whereas the archive link inside the {{cite xxx}} is (in theory) saying "this is what the page looked like when used as reference for WP:V purposes." In any case, it is a very rare situation. I have made the change to simply skip these cases, as you suggested, and tested it locally on sample text. --ThaddeusB (talk) 22:24, 23 February 2012 (UTC)
- I would prefer option 1. The template links to the Wayback Machine and Wikiwix archive services and WebCite has been down a few times. If in is unnecessary to make a change why do so. – Allen4names 20:47, 22 February 2012 (UTC)
Latest trial (100 links) complete without significant issue. I made a few minor improvements to the regex logic, including the suggestion by Allen4names above, and will start the next test shortly. --ThaddeusB (talk) 22:24, 23 February 2012 (UTC)
- Another 130 links complete. One minor issue that caused the bot to skip a few things it didn't need to was fixed. --ThaddeusB (talk) 02:11, 25 February 2012 (UTC)
- Another 100 links complete. Only one very minor change to the bot was made. --ThaddeusB (talk) 18:27, 26 February 2012 (UTC)
- 200 more processed without any issues. Getting close to "going live" (unsupervised). --ThaddeusB (talk) 03:30, 27 February 2012 (UTC)
- Another 100 links complete. Only one very minor change to the bot was made. --ThaddeusB (talk) 18:27, 26 February 2012 (UTC)
A new feature - support for varying date formats - has been added. The bot will now use "January 01, 1900" if {{use mdy dates}} is present and "01 January, 1900" if {{use dmy dates}} is present. If neither is present (>95% of all articles), the bot will continue to use 1900-01-01. These templates didn't even exist at the time the bot was originally programmed. --ThaddeusB (talk) 05:41, 10 March 2012 (UTC)
So... is it operational? How can I direct the bot to a specific article that needs a lot of web archiving? Thank you, this is a great project! (talk) user:Al83tito 22:55, 4 April 2013 (UTC)
- No contributions since 12 March 2012, not sure why not. LeadSongDog come howl! 04:15, 5 April 2013 (UTC)
- I'm looking into restarting the bot this weekend. Barring anything unexpected happening, it should be editing again within the next few days. --ThaddeusB (talk) 05:13, 16 June 2013 (UTC)
- Test runs have begun. No problems have been detected so far. --ThaddeusB (talk) 05:41, 20 June 2013 (UTC)
- I'm looking into restarting the bot this weekend. Barring anything unexpected happening, it should be editing again within the next few days. --ThaddeusB (talk) 05:13, 16 June 2013 (UTC)
- Great news THaddeus! Question, though: given the instability that WebCite seems to be going through, have you considered trying to submit to both them *and* Archive.is going forward? Might be good in case WebCite gets shut down. — Huntster (t @ c) 10:08, 20 June 2013 (UTC)
- WebCite claims the old archives will stay online even if they stop accepting new ones... This is the first time I've heard of archive.is. Do you know if anyone has contacted them about high volume archiving from Wikipedia? I wouldn't want to just start sending them a bunch of traffic w/o them approving of it first. --ThaddeusB-public (talk) 16:15, 20 June 2013 (UTC)
- I don't know that anything has been said to Archive.is regarding en.wiki, but I do know it is in wide use (several thousands of pages) on each of the French, German and Italian wikis. Given that about half the time I go to manually archive something there, it already exists, I have to suspect there is some automation happening.
- If you weren't aware, there was a discussion on Meta regarding the WMF allocating funds to support WebCite, but I don't believe it got anywhere. However, Archive.is was mentioned http://meta.wikimedia.org/wiki/WebCite#archive.is , as was another archiving service in heavy use with the French, wikiwix.com. Both sections are next to each other. Not much to read, but maybe of interest nonetheless. I've only just started using Archive.is and have never used Wikiwix, so I cannot say which seems to be better. — Huntster (t @ c) 21:40, 20 June 2013 (UTC)
- WebCite claims the old archives will stay online even if they stop accepting new ones... This is the first time I've heard of archive.is. Do you know if anyone has contacted them about high volume archiving from Wikipedia? I wouldn't want to just start sending them a bunch of traffic w/o them approving of it first. --ThaddeusB-public (talk) 16:15, 20 June 2013 (UTC)
- Great news THaddeus! Question, though: given the instability that WebCite seems to be going through, have you considered trying to submit to both them *and* Archive.is going forward? Might be good in case WebCite gets shut down. — Huntster (t @ c) 10:08, 20 June 2013 (UTC)
Bare links
editThere has been previous commotion about changing bare links to citation templates, instead recommending {{Wayback}} or {{Webcite}}. Just a heads up in case that ever comes up again as I noticed the bot made a citation from a bare external url. — HELLKNOWZ ▎TALK 21:58, 26 February 2012 (UTC)
- It only does this when the ref consists only of a link. If someone placed a ref containing actual citation info but not using a cite template, it simply adds "(Archived 2012-01-01)". Handling things this way was decided by community consensus at the BRFA. Additionally, a bare link is not a proper citation. Per WP:Citing sources "It is therefore considered helpful... to improve existing citations by adding missing information (for example, replacing bare URLs with full bibliographic citations)" --ThaddeusB (talk) 22:46, 26 February 2012 (UTC)
- Well, no problem by me really. Just saying, it's not uncommon to have arguments over CITEVAR. It's recommended by CITE, but it also says not to change the format. It's easier to move the {{Wayback}} into citations later (in fact my bot does that) than have arguments over citation styles. Just my 2c on what I would do. — HELLKNOWZ ▎TALK 23:17, 26 February 2012 (UTC)
- The bot does respect the existing format as best it can by not changing things. For example, if the it encounters something like <ref>John Smith, "Cool Article" on [http://example.com Nice Website].</ref> it leaves that intact and adds "([http://webcitation.org Archived] 2012-02-26.)" However a bare URL is not a valid citation style as it leaves out crucial information. Thus, in these cases, the bot is perfectly justified in using a {{cite xxx}} template. This was actually discussed extensively at the BFRA (admittedly 2 years ago, the same reasoning for the consensus is still valid). --ThaddeusB (talk) 00:40, 27 February 2012 (UTC)
- Well, no problem by me really. Just saying, it's not uncommon to have arguments over CITEVAR. It's recommended by CITE, but it also says not to change the format. It's easier to move the {{Wayback}} into citations later (in fact my bot does that) than have arguments over citation styles. Just my 2c on what I would do. — HELLKNOWZ ▎TALK 23:17, 26 February 2012 (UTC)
Retrieval dates
editIf a reference links to an archived copy of a webpage, there's no need to provide a retrieval date. Would it be possible for this bot to automatically remove the "accessdate" parameter when it adds an archive link? DoctorKubla (talk) 10:11, 23 July 2013 (UTC)
- From a technical standpoint, the change would be easy. However, I would first need to see some evidence that there is consensus to remove it. Has it been discussed somewhere? --ThaddeusB-public (talk) 17:55, 31 July 2013 (UTC)
- Actually, I forgot to consider that the bot can't know whether the archived copy of a webpage contains the same information as the version originally cited, so a retrieval date would still be necessary. So yeah, never mind. DoctorKubla (talk) 07:16, 1 August 2013 (UTC)
Bot Help
editHi, I do not know programming but I want to control a bot. What can I do? Can U create a bot for me. A bot that does any work will do for me. Thanks and regards --VarunFEB2003 (talk) 09:13, 9 June 2016 (UTC)
archiveurl date
editI don't know if the bot is still operational but I noticed is sometimes adds the wrong archiveurl date off by a day. Likely this is being caused by not converting using GMT after obtaining the Unix time from the base-62 code. An example here it should be 2009-08-22 not 21, as can be confirmed at the Webcite link. My bot WaybackMedic recently added a date mismatch check and is finding a ton how I came upon it. -- GreenC 01:12, 9 February 2017 (UTC)