Archive 1Archive 2Archive 3Archive 4Archive 5Archive 6Archive 10

Disambiguation Bot / Rambot data

I have written a disambiguation bot that looks for very specific text in pages and then changes that text. The current plan in to disambiguate [[Hispanic]] which has over 30,000 pages linking to it (the vast majority of those are from the data put in by User:Rambot). I was using the solve_disambiguation.py but even that is just way too slow and tedious. So I wrote my own bot. I registered User:KevinBot to run the bot.

The way the bot works is as follows:

  1. Gets the pages that link to [[Hispanic]].
  2. Get's the text of each of these pages.
  3. Searches through the text for "% of the population are [[Hispanic]] or [[Latino]] of any race."
  4. If the text does not exist it does nothing to the page.
  5. If the text does exist it changes it to "% of the population are [[Hispanic American|Hispanic]] or [[Latino]] of any race."
  6. It submits the page back as a minor edit with a summary that is TBD.

Also, the bot has more than one throttle so I can slow it down to whatever threshold is deemed appropriate.

Note: this is a custom bot and is not related in any way to the python bots.

I am not yet done testing the bot, but I thought I'd throw it up here for consensus so that when I am finished testing I can get right to running it.

Kevin Rector 05:05, Jul 27, 2004 (UTC)

Some time ago, we discussed changes need to the [[Asia]]n references included in most of the same articles (by Rambot). They are even more in need of fixing as they lead not just to a disambiguation page, but to the wrong article and we can't really make Asia into a disambiguation page.
I was going to try to fix them with the py-bot, but it's obviously preferable if you'd include them in a series of changes you will be doing with your bot, as the number of articles to be covered is very high.
So if there is a possibility that you include a change to [[Asia]]n in your edits, that would be great.
-- User:Docu
I can do that, but I'd need to know what to change it to. I would suppose that it should be changed to [[Asian American|Asian]]. Is that what it should point to? Kevin Rector 13:01, Jul 28, 2004 (UTC)
Yes, though I can't locate the previous discussion. Personally, I'd prefer to use redirects such [[Asian (US census)|Asian] ], but [[Asian American|Asian]] is much better than the present solution. -- User:Docu

I agree with Docu on changing [[Asia]]n and thanks to Kevin for stepping up to fix this. I'd prefer [[Asian (U.S. Census)|Asian]] (note the full stops and capitalization) over [[Asian American|Asian]] because Rambot was changing all the links to point to [[Race (U.S. Census)]]<nowiki> before he vanished without finishing. I personally think linking all of these racial labels to the same place is confusing, so the links should direct to [Asian American|Asian], [Hispanic American|Hispanic], etc., but using <nowiki>[[Asian (U.S. Census)|Asian]] allows us to change this linkage to [[Race (U.S. Census)]]<nowiki> should the consensus change. We should also link to <nowiki>[[White (U.S. Census)|White]] and [[Pacific Islander (U.S. Census)]] and change "African American" to "Black and African American" (per census wording). --Jiang 05:23, 29 Jul 2004 (UTC)

I didn't even know that there was a Race (U.S. Census) until I read these posts. I like the idea of making all the races point to this one article which will explain what the census data means clearly and consicely for all the races. If we need to break it down any more from there, we can. [[Race (U.S. Census)|Hispanic]] and [[Race (U.S. Census)|Asian]] and [[Race (U.S. Census)|White]] really works well for me. If concensus changes I easily run the bot to change it to [[Asian American|Asian]] or [Hispanic American|Hispanic]. Kevin Rector 04:17, Jul 30, 2004 (UTC)

the problem with using [[Race (U.S. Census)|Hispanic]] and [[Race (U.S. Census)|Asian]] and [[Race (U.S. Census)|White]] is that if we want it to link to the race articles instead, then we will need your bot to run the thousands of changes. It is much easier to have [[Hispanic (U.S. Census)|Hispanic]] and [[Asian (U.S. Census)|Asian]] and [[White (U.S. Census)|White]] and have [[Hispanic (U.S. Census)]] and [[Asian (U.S. Census)]] and [[White (U.S. Census)]] all redirect to [[Race (U.S. Census)]]. This way, it reduces down to a matter of changing the redirects rather than running a bot on thousands of articles. --Jiang 06:06, 30 Jul 2004 (UTC)

Some counts: There are 32'010 links to Hispanic, 33'905 to Asia and already 4869 to Race (U.S. census). At the rate of 6 per minute, one can edit approx. 8640 articles per day. If recall correctly, there are at least 25'000 references that should be changed. -- User:Docu

That's a good point about using redirects. I like it. That's what we should do. Also, I've finished testing my bot, and it seems to be working really well. I'm going to run it on 10 articles and see how it fares. I'll post the list of articles edited on User:KevinBot. That way we can check them to make sure there isn't anything catastrophic that needs to be repaired before we mark it as a bot and let it loose. Kevin Rector 20:26, Jul 30, 2004 (UTC)

Ok, the bots run a bit on a limited basis to see how well it works and it's working like a charm. So whoever it is that can mark bots please mark User:KevinBot as a bot. Thanks. Kevin Rector 02:47, Aug 3, 2004 (UTC)

KevinBot is now marked as a bot. Angela. 22:18, Aug 4, 2004 (UTC)


Requst for permission to run a warnfile

Sauðkindin (The Jumbuch) has been running as the interwiki bot on is for some time now, as of July 28, 2004 it has accumulated 93314 interwiki links in 6093 articles that need to be updated, there of 1505 in 205 articles on the english wikipedia.

What i want is permission to run the following command say every two weeks on the english wikipedia:

python interwiki.py -warnfile:warnfile_en.log

This will check the correctness of whether the interwiki link to is: should be updated, and if so proceed to do so, this will be very low traffic, it's only so large now because there has previously not been any interwiki bot running on the is: and if the links on en: are updated it will also be shared around the rest of the languages. -- Ævar Arnfjörð Bjarmason 04:33, 2004 Jul 28 (UTC)

Do you have a separate account for this to run on? Angela.

Bot to import FCC stations

I'd like to write a bot to import the FCC's list of US broadcast stations including FM, AM, and Television. Nothing's been written yet but I wanted to make sure this was okay to do before I bothered to do any work. If someone else wants to do it, that's great. I'm willing to do it but I don't want to duplicate effort. Not sure whether I'd start from scratch or use the python bot.

Posted a request on the pump yesterday but I didn't get any replies, so I thought I'd bring it here. Following is my comment from the pump. Rhobite 15:43, Jul 29, 2004 (UTC)

The lists of radio stations in the US are a little jumbled, e.g. List of radio stations in Massachusetts, List of radio stations in Ohio, List of radio stations in Oregon. Each state's page is formatted differently and contains different kinds of information. There are thousands of licensed stations in the US.
One of the few things the FCC's done right is publishing downloadable data [1] [2]. You can get a list of all the stations in the US. Thoughts on importing this data into Wikipedia? There are something like 8500 FM stations listed, so I'm not sure if that list includes defunct or trivial stations. Anyway we could filter by certain criteria like wattage, I'd have to do more research to find out possible filters. I could write the bot to do this but it might take a while given my schedule. Rhobite 04:26, Jul 28, 2004 (UTC)
8500 is not a large number for the entire United States, considering that many major cities have dozens of radio stations, and not all radio stations are limited to major cities. I'd say, put them all in. Derrick Coetzee 16:43, 29 Jul 2004 (UTC)
I'd like to see it, myself, and hope you could find a way to do the same thing with television stations. Rhymeless 19:53, 3 Aug 2004 (UTC)


BGoldenberg running disambiguate bot

I would like to run the disambiguation bot from pywikipediabot. It is possible I would later run other bots, almost certainly user controlled bots. For instance I think it would be great if there was a bot that could easily be used for categorizing groups of articles, something I already do; the bot would just speed things up. (It looks like replace.py might satisfy this, but for the time being I am mainly interested in disambiguation).

I am a bit unfamiliar with the bot regulation system. I know for major, non-interactive bots, there is a period where it is expected to be run at a very reduced rate, but it seems that at least disambiguation bots don't receive nearly the scrutiny as others. However, I would be grateful if someone explained the proper guidelines, so that I don't cause trouble and frustration to others. I will be running the bot at User:BenjBot and have basically figured out all the scripts and am really just waiting for the go ahead.

Thanks -- Benjamin Goldenberg 06:33, 5 Aug 2004 (UTC)

I would submit that solve_disambiguation.py is not really a bot but rather an alternative user interface. A bot is something that runs automatically but solve_disambiguation.py doesn't do anything automatically. So I would say go ahead and run it (that's my opinion, and I'm not an authority on the subject). However, until and unless you get User:BenjBot marked as a bot, I would just run it as User:Bgoldenberg if I were you. Just an FYI, I had to go to go and find someone to mark User:KevinBot as a bot as putting it on this page was not enough. If you too want to be proactive, I went to User_talk:Angela and she marked it that day. Kevin Rector 14:24, Aug 5, 2004 (UTC)
That was basically the logic I had thought of, but I just wanted to make sure I didn't cause problems for other people. I think I will start using solve_disambiguation.py under User:Bgoldenberg at a slower rate, just to make sure I don't clutter the recent pages list. And then hopefully soon, someone will mark User:BenjBot as a bot, and I can start using it more efficiently. - Benjamin Goldenberg 15:27, 5 Aug 2004 (UTC)
Once the discussion has been here for a week, you can request the bot flag at meta:requests for permissions. Angela. 16:39, Aug 17, 2004 (UTC)
Well that's a useful bit of information. Kevin Rector 16:47, Aug 17, 2004 (UTC)
BenjBot is now marked as a bot. Angela. 19:34, Sep 19, 2004 (UTC)

Discussion pages

Bots must not make modification to comments signed by individuals. Even, it would be better to remove the comment entirely than to attribute text to an individual that they did not create. Other than that, changes on discussion pages can destroy discussions that are premised, for instance, on the peculiarities of linking or disambiguation pages, etc. Please make this change to the Project Page. - Centrx 21:15, 5 Aug 2004 (UTC)

I disagree to some extent. If discussion pages have links requiring disambiguation and the proper use for the link is easily identified, I don't see a problem of a bot disambiguating the link. While technically, the user's comment has been changed as far as what's actually in the page contents, the meaning has not. However, if the link cannot be clearly disambiguated or the user's comment specifically discusses a disambiguation issue then the link in the comment should not be disambiguated by a bot or manually. I have come across several talk pages that discuss how English or French is disambiguated. In this case, I have not disambiguated the link. I don't know of any bots that automatically disambiguate links. AFAIK, people are using bots to assist them with disambiguation (e.g. solve_disambiguation.py for pywikipediabot) but still actually require a human to decide on how a link is disambiguated. The bot just helps reduce the editing time required to disambiguate a link. If there are several hundred links to do, this makes a big difference. Making a policy that a bot cannot disambiguate links on talk pages is easily defeated by doing it manually. RedWolf 08:19, Aug 7, 2004 (UTC)
The text of what one writes is their text, signed by them. It would also not be appropriate to do a copyedit of one's comments, even if it does not look like it changes the meaning. Indeed, it is confusing when a person edits their own comments if that edit is out of the order of the discussion. As for your main point: in that case, there should be a policy that all modifications of discussion pages be attended to by a real person. - Centrx 17:07, 7 Aug 2004 (UTC)
I can't speak for anyone else, but my bot KevinBot doesn't edit any user pages or talk pages. Personally I think that should be the standard. Bot's should not change talk or user pages. Having said that, as I mentioned earlier above, I don't really consider solve_disambiguation.py to be a bot so much as an alternate user interface. Kevin Rector 23:14, Aug 7, 2004 (UTC)
Well, if Wiki policy (although I have not found anything yet on the issue), says that links on talk pages should not be disambiguated, so be it. However, over time, a number of pages are going to have hundreds of links to talk pages, thus increasing the effort of keeping up with the disambiguation of these pages. In the iterim, I have modified my local copy of solve_disambiguation.py to ignore talk pages. RedWolf 05:51, Aug 10, 2004 (UTC)
Well, I have no idea what "policy" is, I'm just saying that I'm not personally changing talk and user pages with bots. I have no problem making changes in talk pages per se (especially when it comes to disambiguation). However, I do have some issues with disambiguating in user pages. If I want to disambiguate a user page, I'll leave a message on their user talk page. Kevin Rector 13:30, Aug 10, 2004 (UTC)

Hi. I'm interested in creating a 'bot for detecting link rot. As I see it at present, I'd get the bot to download a random page once per suitable time period. The bot would then extract external links from the page, and check the pages pointed to by these links to see if they are still there. Links which remain inaccessible for (say) a number of days would then be listed on a web page. Humans could then occasionally check a page (on my server) to find a list of dead links, and the wikipedia pages that they're on, and could go and have a look.

Comments? If I did this, I would write the program myself and host the bot here (University of Westminster, UK).

Ross-c 15:29, 17 Aug 2004 (UTC)

It's certainly worth doing, although I think you'd need a degree of manual verification of each apparently successful link (that is, you couldn't say a successful HTTP request necessarily meant the exlink was still valid). One thing you don't need to do is to check against an online copy of wikipedia. It'll be much faster, easier, and less server-mangling, if you download a copy of mediawiki and a recent cur database drop, and run the bot on an offline copy. You also don't have to write all of the bot yourself - there's already a python framework for bots which interact with wikipedia. I believe this is what User:Topbanana uses for compiling Wikipedia:Offline reports. -- Finlay McWalter | Talk 20:54, 17 Aug 2004 (UTC)
The offline reports are generated by from periodic database dumps (See [3]). External link validation sounds like a worthy project - if helpful I can generate a list of all external links for you (suggest comma-separated list of "article title,external link", one per line?). - TB 08:10, Aug 18, 2004 (UTC)


Hmmm. I see the point of using an offline database rather than downloading from wikipedia. A list of links would be most of the work, and the rest of the programming would be easy. For the kind of thing I'm thinking of, it'd probably be easier for me to write a bot from scratch, rather than use the framework. However, I'm worried about how much space all the data would take up. I was thinking of running this on one of my servers that only has a few gig free space. The list of links would surely not be that big, no? I was thinking of doing verification based on the http error code (using wget so that temporarily moved links are resolved). More careful verification of the contents of the page the link points at would wait for version 2.0. - Ross-c 20:41, 18 Aug 2004 (UTC)

NohatBot

Does anyone have a link to the discussion of this bot? I couldn't find it. anthony (see warning) 03:09, 30 Aug 2004 (UTC)

My bot was created several months ago and was given a bot flag after discussions with Brion Vibber on IRC. It is used almost exclusively for uploading batches of files that I don't feel like uploading manually. I just added it to this page today because that is the first time I'd run across this page. Nohat 05:59, 30 Aug 2004 (UTC)
Sounds fine to me. anthony (see warning) 11:06, 30 Aug 2004 (UTC)

Janna

Information on what Janna is currently doing will be kept on User:Janna. anthony (see warning)


Bot(?) making incorrect changes

It appears that a bot from IP 209.90.162.1 is making numerous incorrect changes to the encyclopedia. The kind of change I have noticed is linking instances of "chemical" to "chemical compound" indiscriminately. In many, nay MOST, cases, these changes are patently false and it is becoming a pain for me to go through and revert them. - Centrx 23:23, 16 Sep 2004 (UTC)

It appears to have stopped Special:Contributions/209.90.162.1. It's not necessarily a bot. Maybe Chemical shouldn't redirect. -- User:Docu
I am, at this very moment, making chemical a disambiguation page. - Centrx 00:16, 17 Sep 2004 (UTC)


Darbot registration

I would like to get permission to run a user-controlled pywikipediabot to make the spelling of science and chemistry articles consistant with the IUPAC nomenclature rules. It would only change articles I specifically told it to, and would only be making changes that I would make anyways.

I have registered the account Darbot for this task should my request be approved.

Darrien 05:27, 2004 Sep 17 (UTC)

There doesn't seem to be consensus on these changes being made at IUPAC name vs. common name. I feel that more people should be informed about the change before a bot is run on this. Angela. 00:38, Sep 25, 2004 (UTC)
Most of the discussion there focuses on capitalisation and complex organic compounds. I have no intention of changing Pagodane to Unadecacyclo[9.9.0.01,5.02,12.02,18.03,7.06,10.08,12.011,15.0 13,17.016,20]eicosane. I only want to change archaic names, such as:
  • Ferrous -> Iron (II)
  • Ferric -> Iron (III)
  • Sulphur -> Sulfur
  • Aluminum -> Aluminium
  • Oil of Vitriol, disodium salt -> Sodium sulfate
and the like.
I have made dozens, if not hundreds of these changes by hand, and (to the best of my knowlege) have not been reverted once. There are a lot of redirects [4] [5] that I would like to change, but simply don't have time to do by hand.
As I said above, it will be given a list of articles to change, so it won't accidentally change "Sulphur Springs, Anytown" to "Sulfur Springs, Anytown".
I strongly oppose switching words to the American spelling. - SimonP 06:05, Sep 25, 2004 (UTC)
It's not American spelling. The International Union of Pure and Applied Chemistry has declared that sulfur is the prefered spelling, which is why British scientists are begining to use "sulfur". It was also decided that "Aluminium" and "Caesium" are the prefered spellings as well. So if you want to shallowly look at this as an American-British difference, you got two out of three. Also, did you miss it when I wrote "Aluminum -> Aluminium" above?
Darrien 06:35, 2004 Sep 25 (UTC)
As an American, I agree with Simon. I also object to changing Ferrous and Ferric to Iron. RickK 06:24, Sep 25, 2004 (UTC)
Why is that?
Darrien 06:35, 2004 Sep 25 (UTC)

As a chemist, I think this is a good idea. We're an encyclopedia of chemistry, we should use the proper names for chemicals, compounds, ions, ect. Gentgeen 07:19, 25 Sep 2004 (UTC)

And I always thought we were an encyclopedia for general users. - SimonP 16:17, Sep 25, 2004 (UTC)
Whatever we are, we are international. If Wikipedia already accepts an IUPAC term, then I think it is a good idea. Bobblewik  (talk) 17:21, 25 Sep 2004 (UTC)

I object. IUPAC doesn't determine our spelling. anthony (see warning) 21:12, 25 Sep 2004 (UTC)

No, but the wikipedia Manual of Style does [6]:
"In articles about chemicals and chemistry, use IUPAC names for chemicals wherever possible, except in article titles, where the common name should be used if different, followed by mention of the IUPAC name."
Darrien 08:02, 2004 Sep 26 (UTC)
The manual of style refers to names not spelling. - SimonP 16:47, Sep 26, 2004 (UTC)
The spelling of a name, is part of the name.
Darrien 10:39, 2004 Sep 28 (UTC)
Part of the debate here is moving from "common names" like ferrous chloride to IUPAC names like iron (II) chloride. The Manual of Style seems to say that it should stay at ferrous. Although you could argue that these are traditional or archaic names and not "common names". I don't mind the articles being moved - as long as the old names are mentioned in the article and left as redirects. Rmhermen 18:49, Sep 26, 2004 (UTC)
Fair enough. Since there appears to be consensus for this manual of style entry (even though I disagree with it), I withdraw my objection to this bot. anthony (see warning) 13:26, 29 Sep 2004 (UTC)

This is a good idea, and seems like it will be run in a sensible manner. (I would also not want to see strange IUPAC names as article titles of chemicals everyone knows by a more common name). I might even go a step further though and say the IUPAC name should be somewhere in articles with common names, and they often are. See Caffeine. I don't know what possible objection there would be to changing the names from a random hodge-podge of spelling and previous deprecated standards to the current international standard which is used and accepted by chemists around the world. If someone is willing to do this major undertaking I think we should be appreciative. - [[User:Cohesion|cohesion ]] 19:11, Sep 26, 2004 (UTC)

Good idea. That follows the current Wikipedia manual of style, since the IUPAC names are names with particular spellings and the manual of style says to use them. With other spellings, they would not be those names. I agree with the purpose of this bot. It's silly not to use standardized scientific names and spellings in articles on chemicals and chemistry, whatever one uses elsewhere. But it would indeed be useful and best and, indeed, necessary to also sometimes mention the older names (and to always do so in articles on the particular substances), to make clear to non-chemists that traditional and new terminology are talking about the same things. It is very wrong for an encyclopedia for general users not to use standardized scientific terminology in its scientific articles where there is standardization, except in special circumstances. Otherwise the encyclopedia is providing increasingly outdated terminology to the users. And different terminology in different articles produces confusion. I'm not a chemist, but in reading something to do with chemistry, I would like to see things spelled out using proper, standardized terminology. Equally, it is wrong not to provide equations between varying terminology that a general reader is also likely to come across, so that I can tell that an article in Wikipedia is talking about exactly the same thing as a book from from 1950, presuming that they are talking about exactly the same thing, or that an article is talking about the same thing as a book released in 2004 if there are alternate conventions still in popular use and scientific use. Jallan 19:29, 26 Sep 2004 (UTC)

Sounds good to me. Thue | talk 19:49, 26 Sep 2004 (UTC)

Makes a lot of sense to me. The changeover from ferrous & ferric to iron (II) & iron (III), etc., started at least 30 years ago, and has nothing to do with American vs British spelling. The IUPAC system is consistent and easier to understand than the historical accidents it replaces. I'm quite puzzled by the opposition to the proposed name changes. Wile E. Heresiarch 03:10, 27 Sep 2004 (UTC)

Go for it. The new names are the overwhelming usage in chemical practice, and anyone who went to school or took their degree in the past 30 years will have used them unless they were deliberately taught "old style" names from outdated textbooks. Still, we should preserve a mention of the old names in a chemical's own article, even after we have updated the main title to IUPAC convention, just as we should keep a mention of old alchemical names. And, of course, no renaming caffiene to 3,7-dihydro-1,3,7-trimethyl-1H-purine-2,6-dione. -- The Anome 11:44, 27 Sep 2004 (UTC)

This would be helpful to the Wikipedia as a whole. I'm for it. --131.91.238.38 00:09, 9 Oct 2004 (UTC)

I am strongly opposed. See Talk:Global warming and Wikipedia talk:Manual of Style (William M. Connolley 17:31, 14 Oct 2004 (UTC)). Sulphate should not be replaced with sulfate, or any other americanisations, outside the chemistry articles.

Why do you consider "sulfate" to be an Americanization when the IUPAC has declaired it to be the official spelling? Also, why do you "strongly oppose" this change on the grounds that you don't want it to change non-chemistry related articles, when I've specifically said that it will only be used on chemistry related articles?
Darrien 21:16, 2004 Oct 15 (UTC)
(William M. Connolley 21:28, 15 Oct 2004 (UTC)) Because you said at the top the spelling of science and chemistry articles. If you will rephrase this, I will reconsider.
Sorry, I thought you meant that you didn't want this bot changing all articles, which it isn't. I intend to run it on chemistry and science related articles, all of which would be handpicked by myself.
(William M. Connolley 22:03, 15 Oct 2004 (UTC)) Then my (strong) objection remains: non-chemistry science shouldn't be touched in this way. There is no policy to support this, and policy against.
I'm still curious as to why you consider "sulfur" to be an Amercanization.
Darrien 21:55, 2004 Oct 15 (UTC)
Because it is? What makes you think otherwise? IUPAC is essentially recommending Americanisation - the fact that it comes through IUPAC doesn't affect that.
So the fact that the IUPAC recommends "sulfur" is an Americanization "just because"? You seem to think that because the spelling "sulfur" is used in America, that it is an Americanism regardless of how well accepted, etymologically correct, or standardized it becomes?
Darrien 01:09, 2004 Oct 16 (UTC)
(William M. Connolley 09:20, 16 Oct 2004 (UTC)) Yes
We should be adhering to international standards. IUPAC should be no exclusion. Perhaps mention once could be made of alternate spellings, if demand requires it. Dysprosia 08:45, 19 Oct 2004 (UTC)

(note: the following including the alteration of Dysprosia's comment, was added by Mr. Jones. comment replaced above by sannse (talk) 09:57, 5 Nov 2004 (UTC) I'll pop my name by my interpolation too. Mr. Jones 21:21, 5 Nov 2004 (UTC))

The use of the word 'color' in the CSS2 spec property names does not make it any less of an Americanism. Also, when refering to style="color:red" a resident of the UK would still say that the element is coloured red.
I object to this bot in this respect. There is no practical need for forcing one spelling or the other. The articles will be processed by people, not machines. Not that it would be a great task for a program to treat both versions as the same. (BTW, computers run programs, the TV shows programmes). Mr. Jones 15:01, 4 Nov 2004 (UTC)
We should be adhering to international standards. IUPAC should be no exclusion. Perhaps mention once (one mention? Mr. Jones) could be made of alternate spellings, if demand requires it. Dysprosia 08:45, 19 Oct 2004 (UTC)
So we should use ISO dates, then? That is silly. We shouldn't doggedly adhere to standards just because they exist. They're often contradictary for a start. Even if they aren't, they're not necessarily sensible. We should look at NPOV first, necessity second and standards for their own sake third or even less. Mr. Jones 15:01, 4 Nov 2004 (UTC)

(note: end - sannse (talk) 09:57, 5 Nov 2004 (UTC))

Testing

I am going to start testing this bot soon, it's sat in discussion long enough. General community consensus is that it's a good idea and most of the objections have been from people who falsely believe that the purpose of this bot is to "Americanize" articles. Darrien 10:12, 2004 Nov 5 (UTC)

My objections go beyond that. I fear we'll be losing information. Can't your bot keep traditional and non-American chemical terms, but add the ones from the standard as well? Mr. Jones 21:21, 5 Nov 2004 (UTC)
Yes, and depending on the circumstances, it most likely will. It's going to be more of a macro than a bot in that it will be hand fed a list of articles and what it should change as opposed to actively going out and blindly running a search and replace on its own. I'm using it because I don't have enough time to change all articles to conform to that section of the Wikipedia:Manual of Style, nor do I want to risk getting RSI from making the changes by hand if I do get the time.
Darrien 21:48, 2004 Nov 6 (UTC)
The standardisation is all very well, but just as with imperial and metric real people in the real world use a mixture of both in different contexts. Mr. Jones 21:21, 5 Nov 2004 (UTC)
Which is why I will evaluate each article on a case by case basis before feeding it to the bot.
Darrien 21:48, 2004 Nov 6 (UTC)
Is the source for your bot on the web anywhere? Mr. Jones 21:21, 5 Nov 2004 (UTC)
I'm using the stock pywikipediabot from http://sourceforge.net/projects/pywikipediabot/. Darrien 21:48, 2004 Nov 6 (UTC)
Sorry? You've based it on the pywikipediabot, but what changes have you made?
None. I said I was using the stock pywikipediabot. Specifically "replace.py".
Darrien 04:04, 2004 Nov 10 (UTC)
Could you supply a patch file? Mr. Jones 22:24, 9 Nov 2004 (UTC)
I'll help you patch it if you like. I'm not simply being obstructive. Take a look at my edit history. That's the opposite of where I'm coming from. Mr. Jones 21:21, 5 Nov 2004 (UTC)
I never meant to imply otherwise.
That's good. May I review your code to see what it will do? Mr. Jones 22:24, 9 Nov 2004 (UTC)
It's at the same link I provided above, http://sourceforge.net/projects/pywikipediabot/
Darrien 04:04, 2004 Nov 10 (UTC)
Darrien 21:48, 2004 Nov 6 (UTC)
object (William M. Connolley 15:17, 6 Nov 2004 (UTC)) You have no consensus to do this outside of pure chemistry articles, if you have it there.
Have you actually counted the votes? I count 11 for, 4 against, one undecided. That sounds like consensus to me.
(William M. Connolley 22:02, 6 Nov 2004 (UTC)) How do you achieve that count?
By counting. Gentgeen, anthony, cohesion, Jallan, Thue, Wile E. Heresiarch, The Anome, 131.91.238.38, Dysprosia, Bobblewik and Rmhermen all seem to agree, while SimonP, yourself (William M. Connolley), Mr. Jones and RickK oppose it, with Angela undecided. You (William M. Connolley) and SimonP oppose it based on false assumption that I'm trying to "Americanize" the articles. How did you count it?
Darrien 02:30, 2004 Nov 7 (UTC)
(William M. Connolley 14:22, 8 Nov 2004 (UTC)) Anthony isn't supporting - he only withdrew his objection. And you can't count anon's in support.
Darrien 21:48, 2004 Nov 6 (UTC)

Objection time

I've found two recent cases in which, after a week of no comments at all on their requests, people began running bots. Both of these bots have caused some amount of controversy. (In one case, because people listed objections before the bot was run, but after a week had passed. In the other because the bot started making bad edits.) What do people think of ammeding the rules to make clear that someone, at least, needs to say "OK, run it." before a bot is run? Snowspinner 20:35, Sep 26, 2004 (UTC)

I think we need less bureaucracy here, not more. If a bot is going to do something useful and harmless, why wait a whole week? One or two days without objection should be enough. After that: "Sysops should block bots, without hesitation, if they are unapproved, doing something the operator didn't say they would do, messing up articles or editing too rapidly". anthony (see warning) 13:38, 29 Sep 2004 (UTC)

This bot was requested on June 27th by User:Docu. The request was not particularly detailed - it only noted that it wanted to run the pywikibot. No one commented on it one way or another. After a week, it was added to the list of bots, where its intended purposes were finally declared.

The bot is currently being used to categorize a bunch of biographical articles by year of birth and death. These categorizations do not have consensus, are against the policy at Wikipedia:Categories, lists, and series boxes, and people are finding errors in them.

For now, I'm blocking the bot as unapproved to do what it is doing and as messing up articles, but I wanted to start a discussion here about the bot so that it can, hopefully, get reapproved with some description of what it's actually supposed to be doing. Snowspinner 20:35, Sep 26, 2004 (UTC)

The categories by year of birth and year of death have been listed on Categories for deletion and debated. Finally, they were kept, see Wikipedia talk:People by year/Delete. These categories aren't of much use unless they are fairly compelete and D6 lead quite a way ahead on this.
As the bot is being used to assign categories that we want to have in Wikipedia, the use I made is in accordance with the original intent.
The bot uses list from Wikipedia and information available in Wikipedia to assign those categories. Many have been added based on List of people by name, lists that occasionnally contains inexact information. This information can now and is being corrected much easier in the article, by myself or by others. The article as such is left intact, except that straying interwiki links are being moved to the end of the articles.
More recent additions are based on manually checked lists, but, obviously occassionally there is a slip in the data being inputed, slip that could happend if the categories are assigned manually as well. -- User:Docu
    • This bot was adding the categories as far back as Sep 12, before the deletion debate was done. There have been several objections to the category scheme itself. The user has been asked repeatedly to refrain from running it to populate these categories until discussion reaches consensus, and his methodology has received scrutiny. Simply saying that the categories survived WP:CFD does not implicitely say that a bot should be changing thousands of articles. -- Netoholic @ 21:08, 2004 Sep 26 (UTC)
      • The chronology was the other way round: it ran before the categories were listed for deletion. -- User:Docu
        • If anyone wants to go about 15000 edits into the bot's contrib list, they will see those categories being implemented. I myself requested that you desist on the 12th during the time that the CFD discussion was happening. -- Netoholic @ 22:11, 2004 Sep 26 (UTC)

That a number of people are still complaining about the categories makes me wary of their expansion. As I've said, they don't seem to fit in with Wikipedia:Categories, lists, and series boxes. But that's OK - it's just that the bot's approval was more than a little vague. Snowspinner 21:39, Sep 26, 2004 (UTC)

A bit of care with this bot please - it added [[Category:1981 births]] to Owen Hargreaves over 3 days after this article had a copyvio notice slapped on it. Articles with copyvio notices should not be edited at all. -- Arwel 14:28, 2 Oct 2004 (UTC)

The category was indeed added based on my summary check of the article in the cur table available for download. The page later showed up on last week's check for sortkeys (the category Possible_copyright_violations sorts by title rather than surname). As I figured it's not a problem to leave the article title categorized, I didn't change it (either the page would get restored or deleted). -- User:Docu

Snowbot

I would like to run the Pywikipediabot as Snowbot. The only purpose of this bot will be to handle templates for deletion. As it stands, if a widely used template gets deleted, I have to remove it manually from pages, which can take upwards of an hour. Snowspinner 21:49, Sep 26, 2004 (UTC)

I object to this user running a bot. [7] -- Netoholic @ 22:01, 2004 Sep 26 (UTC)
I'm shocked. What are your objections? Snowspinner 22:03, Sep 26, 2004 (UTC)
I don't trust you to only use the bot for the stated purpose. [8] -- Netoholic @ 22:09, 2004 Sep 26 (UTC)
What about my actions makes you think I would use the bot for some other purpose, and what other purpose do you think I would use it for? Snowspinner 22:11, Sep 26, 2004 (UTC)
The bot policy does not have any guidelines for what a valid objection is, but, as one of the concerns about bots is clearly vandalbots, character seems a logical objection. [9] -- Netoholic @ 22:13, 2004 Sep 26 (UTC)
That's fine, and I think it is a valid objection. But if you're going to make an attack on my character, you should be responsible enough to back it up with what about my character you object to. Snowspinner 22:14, Sep 26, 2004 (UTC)
Snowspinner has shown a manifest arrogance in his activities on Wikipedia, including explicitly stating he does not, as sysop, consider himself bound by the Policies and Guidelines the rest of Wikipedia has agreed to follow -- and that he agreed to follow when he was voted a sysop. He has engaged in numerous edit wars over Policy pages in attempts to change Policy according to his wishes, including removing votes opposing his own vote.
While removing Templates may be onerous, it is my opinion that, having abused his position and having abused the trust reposed in him as a sysop, Snowspinner cannot be trusted to run a 'bot. -- orthogonal 00:00, 27 Sep 2004 (UTC)
This is, like almost everything orthogonal has said in recent memory, a bald-faced lie. Snowspinner 00:26, Sep 27, 2004 (UTC)
I wish you had not written that, Snowspinner, as it rather requires me to be explicit. at the risk of embarrassing you, here's where you agreed that my "bald-faced lie" was in fact a an accurate summary of your actions:
Snowspinner wrote: "My view is simple and accurately summarized by orthogonal. I believe that policy exists that is not written, and that the mere failure of a policy to gather community consent (As opposed to actively gathering community rejection) does not mean that it is not policy." (from Snowspinner's comment at [10])
"I should specify - I agree that I blocked Robert for a reason that is not explicitly allowed under policy, and that I did so knowingly." (from Snowspinner's comment at[11])
Details of Snowspinner's edit warring can be found here: [[12]].
Evidence of his removal of votes can be found here: [13]. Please note that Snowspinner removed opposition votes as being "too late", but did not remove supporting votes which were even later.
That Snowspinner apparently forgets that he did these things, within the last month (as I won't suggest he remembers them and is lying about them), further argues that he is incompetent to run a bot -- or to make any other important decisions for Wikipedia.
Persons wishing to examine Snowspinner's record more closely are invited to see User:Orthogonal/Snowspinner Time-line and its discussion page at User talk:Orthogonal/Snowspinner Time-line. -- orthogonal 00:46, 27 Sep 2004 (UTC)
I have no objections to Snowspinner running a bot, provided he makes sure it's well-tested before letting it loose. I'm sad to see this petty bickering continue. -- Cyrius| 00:49, 27 Sep 2004 (UTC)

I withdraw the request, as I'm going on an indefinite wikibreak due to the continual harassment of Netoholic and orthogonal. Snowspinner 01:05, Sep 27, 2004 (UTC)

  • I object to that characterization. My only actions have been to defend objections to my bot made by you on this page. "Harassment" has a specific and strongly negative meaning, and you should watch your words. Try to get along with people. I've made attempts with you before, both on WP and on IRC, and I hope that you'll try to give effort in that regard. -- Netoholic @ 01:21, 2004 Sep 27 (UTC)
    • I think harassment adequately covers calling me a "fuck" repeatedly in IRC, yes. Snowspinner 01:26, Sep 27, 2004 (UTC)
  • Snowspinner has given in his RFC against me, evidence of the alleged "harassment". It is notable that no one endorsed his view on the RfC. -- orthogonal 01:24, 27 Sep 2004 (UTC)
  • Support bot. [[User:Neutrality|Neutrality (talk)]] 16:55, Sep 28, 2004 (UTC)

I object to this bot. Widely used templates should be hard to delete. The only time I could see a use for automated destruction of a template is when the template was created through an automated means. It shouldn't be easier to destroy than to create. I am also hesitant about letting Snowspinner run this bot unilaterally. Taking a look at templates for deletion he seems to get into heated arguments in favor of deleting certain templates, and I don't trust him to understand when lack of dissent comes from the fact that not many people are aware of the discussion. A template which is widely used and was not created through automated means should be strongly presumed to not have a consensus for removal, dispite the fact that no one came to its rescue on TfD (which unlike VfD is not very well advertised). anthony (see warning)

Topjabot

I would like to run pywikipediabot with User:Topjabot for various tasks: solving disambiguations, copying images to Commons and changing tables to wiki-syntax. I won't use fully-automated scripts, so there is no risk of some sort of malfunction leading to massive damage. Gerritholl 16:34, 30 Sep 2004 (UTC)

I support this, as it's non-automated. I'm not even sure if we need to list non-automated bots here. anthony (see warning) 00:08, 1 Oct 2004 (UTC)
I was blocked for a short period of time when I used it before I had permission, that's why I ask permission now ;-). Gerritholl 10:47, 1 Oct 2004 (UTC)
Marked as a bot on en. Did you need it marked on the Commons as well? Angela. 09:07, Oct 8, 2004 (UTC)
I didn't, but I'd like to have it marked as such now. I found some sites which may have free images (I will, of course, first ask the owner to be sure). Gerritholl 09:48, 23 Oct 2004 (UTC)
Andre's marked it as a bot for the commons. However, image uploads by bots are not currently hidden from recent changes, so it won't have any effect. Angela. 11:14, Oct 27, 2004 (UTC)


Pywikipediabot category.py

Is the module category.py [14] of pywikipediabot suitable for use on Wikipedia? Can it be run in automatic mode if it uses a reasonable list of articles to add specific categories?

A bug currently adds occasionally a duplicate category if there is an existing category with a different (generally incomplete) sortkey. If the list of articles is filtered with the last available version of en_categorylinks_table.sql.gz, it would estimate theses cases to less than 0.2%.

If the module is used in manual mode (confirmation of each addition), I assume it's not considered a bot subject to registration, even if this is likely to increase the effective number of categories add during the time it's used. -- User:Docu


There is now a fix for the categories added to biography articles: People by year/Reports/Multiple cats/SQL which allows to identify articles with multiple birth and death categories and fix them. -- User:Docu

Wikicrawl

I have done a bot that cheks consistency of language links in all languages. WWW user interface is in [15]. It does not change anything automatically and it just goes through all the referred page once. I'd like to get permission to keep this service in my web page (or in any other site like in Wikipedia - it is free). It is standalone Python code. More information behind the link. -- User:Etu

LinkBot

Nickj is seeking the approval of Wikipedia talk:Bots for a small semi-automated trial run of the link suggester on 100 or less pages. Please see this page for a detailed description of what this script is, and more info what it will do and what it will not do. -- Nickj 07:45, 18 Oct 2004 (UTC)

I support this tool — it appears to be very well thought-out. Derrick Coetzee 13:30, 19 Oct 2004 (UTC)
Me too. Nickj, you mention that it has been run and the results manually pasted on a talk page -- could you point to a couple? Just curious... Tuf-Kat 20:03, Oct 23, 2004 (UTC)
No problem, here are a few examples: Talk:Snowy Mountains Scheme, Talk:William Charles Wentworth, Talk:Peter Reith, Talk:Miss Australia, Talk:Federal Court of Australia, Talk:Ash Wednesday fires, Talk:Agriculture in Australia. -- Nickj 05:44, 25 Oct 2004 (UTC)

LinkBot is now marked as a bot following a request at m:Requests for permissions. Angela. 23:58, Nov 24, 2004 (UTC)

The LinkBot has just uploaded suggestions to exactly 100 pages - you can get a full list of those here. I'd like to do a further trial run tomorrow - would a 1000 pages be acceptable ? All the best, -- Nickj 11:53, 1 Dec 2004 (UTC)

bot request

Is there, or could there be written a bot to fix brackets, see User:Nickj/Wiki_Syntax/Index -- it would probably need a human to click yes/no to fix it, since some might not be wrong. Dunc| 12:35, 3 Nov 2004 (UTC)

I don't know if there is, but I've used the rambot to perform spell checking. It requires me to verify every single change to ensure that I never change anything incorrectly. Such "bots" are not bots in the traditional sense. They are just a computer tool to optimize the process of manually looking for those changes. I have no problem with a bot that asks yes/no and provides you with the appropriate information to make the best choice! -- Ram-Man 04:36, Nov 10, 2004 (UTC)

rambot

For what its worth, I plan to resume the rambot's tasks sometime possibly in the next couple of days. This is not a departure from previous actions, but I thought I should at least mention it for those who care. See "rambot" for some of the things that will be performed. The tasks represent requests for changes that are months and months overdue. -- Ram-Man 13:00, Nov 8, 2004 (UTC)

Once the rambot is unblocked and the discussions on server load below are fully and completely hashed out, I plan to run a bunch of various tasks. In terms of cities and counties, I plan to add UN/LOCODEs to the cities that have them. This will include setting up shortcuts in bulk to facilitate easy usage (e.g. UN/LOCODE:USLAX for Los Angeles, California). Simultaneously with this I will be adding Template:Mapit-US-cityscale templates to the external links section of every city that has GPS coordinates listed. This is a thousands of cities. This will add automatic links to street maps, satellite photos, and a topographical map of the location in question, a terribly useful thing to have. (See the example in Cleveland, Ohio). In addition to the tasks above, I also plan to add a short two or three sentence request to every user talk page who has not been asked about multi-licensing. This latter option may not happen depending on whether or not I can get developer help to do it directly with the database to eliminate server load and the need to use a bot. See User:rambot for any additional information (as always). Ram-Man (comment) (talk)[[]] 13:29, Dec 16, 2004 (UTC)

Please, please do not add any more messages to user talk pages in any automated fashion until the question of whether it is right to do this has been settled explicitly. Thanks. PRiis 17:02, 16 Dec 2004 (UTC)

Well I am placing this message here because of previous requests to make my intentions known on this page. I am asking for explicit permission. Discuss and vote away, although I fear that only those who are opposed to my plan are actually paying any attention to me. If we do vote here, we should count all of those users that have already given their support to me, since spamming them and asking for them to vote again would hardly be appropriate. Ram-Man (comment) (talk)[[]] 17:26, Dec 16, 2004 (UTC)

Since when do we vote on these things? I think if PRiis objects to your bot doing this, then you shouldn't do it. I'd object myself, except for the fact that you don't intend to message me (and I've already specifically asked you to never use a bot to edit my talk page), so I don't feel I have the standing to object in this particular case. anthony 警告 17:56, 16 Dec 2004 (UTC)

One word: Approval. More words: Read this page (and others too), as it is common practice to vote on bot proposals. Ram-Man (comment) (talk)[[]] 18:06, Dec 16, 2004 (UTC)

I suppose it is unclear what level of approval is meant by "approval". I always assumed it meant consensus (though I just looked and it mentions "rough consensus", whatever that means, but also implies that there should be no objections before starting a bot). I've looked over this page, and I don't see any voting going on. What level of approval do you feel we need? anthony 警告 21:21, 16 Dec 2004 (UTC)
I've looked into this a bit more. I've found the original change, and the discussion leading up to it. Still not clear what's meant by "approval" and "rough consensus". I always assumed it meant if people object to it we shouldn't be using a bot to do it. anthony 警告 21:30, 16 Dec 2004 (UTC)

This entire bot policy is mostly the design of a very small group of people. The discussion on the limitation on how fast a bot operates was discussed on IRC, and therefore we have no way of knowing who or what came up with it. IIRC, I was the one who originally added the 3-point policy, which was later turned into a 4-point policy when "approved" as added. Everyone has since almost religiously followed both forms of the policy, and it has worked quite well as general guidelines. The problem with the other specific rules is that they are quite inflexible and not strictly followed by bot owners, particularly the speed restrictions. Part of this is because we generally trust a bot after it has proven itself, but a lot has to do with the relatively few number of people who frequent this page. It is quite hard to get an adequate consensus, as it is often typically biased either in favor of bots or not in favor, depending. Ram-Man (comment) (talk)[[]] 21:59, Dec 16, 2004 (UTC)

This is bigger than the technical aspects of bot policy. I think what we need is a broad consensus one way or another on whether it is a good idea to allow people to leave automated bulk messages on user talk pages. Personally, I would vote no for the following reasons: 1) It will quickly make talk pages considerably less useful as they fill up with junk (as we've seen with other channels). 2) It will give an inordinately powerful voice to those who run robots. I think this should be resolved among the broader community as a matter of policy. I don't see why the next mass posting run can't wait until this is settled. What's the urgency? PRiis 23:38, 16 Dec 2004 (UTC)
There is no urgency, and the matter is waiting on a resolution, which I imagine should be proposed at Wikipedia:Spam for all parties who care. There are other things to work on that take priority anyway. Nevertheless, as a matter of policy and also by request, I must state bot activity proposals for review, whether or not they actually ever are enacted, I suppose you understand that. Ram-Man (comment) (talk)[[]] 23:57, Dec 16, 2004 (UTC)
OK, thank you. I'll bring it up at at Wikipedia:Spam. PRiis 00:17, 17 Dec 2004 (UTC)
There doesn't seem to be a lot of interest in a specific policy change. It also seems that the last incident was an aberration, since it wasn't spefically discussed beforehand on the bots page, and in future such things will be checked out first. So I won't stand in the way of whatever is planned next--I don't want to be an obstructionist. If a lot of people come to decide it's a big problem, I'm sure it'll be revisited. Thank you for taking this seriously, though. PRiis 17:58, 21 Dec 2004 (UTC)

Rambot clarification

I need to clarify what I want to do with the rambot. After using the rambot to do reads an discover which users have edited rambot articles, I then plan to take that list and ask all of the people on the list that I have not already asked. To do so, I would use the bot to add a short message which would link to a page with more detailed information. I would do somewhat small batches of users at a time, something like 50-100 users, and then wait for responses. The reason for this is because if I go to fast, not only does it look like mass-quick-spam, but it also causes too many users to respond at once. When users have stopped responding and their questions answered, I will do another batch of users, as appropriate until the task is completed. This behavior will be significantly different from the previous action of doing about 1,000 people in a very short time with a very large spam-like message. The main purpose is to save me the time from having to manually open their talk page, click on "Post a comment", and finally copy and paste the message before saving it. I figure it will save me at least 10-30 minutes per batch so I can do other things at the same time. There has been the suggestion to try and get help from a database administrator to help allieviate any potential strain on the server. This would probably require adding the message to all of the user account at once, but would not require using the bot. If anyone would rather have this option, mention it. If neither option is acceptable, I will manually perform the action. Does anyone have any objections to this new behavior, and if so which category do they fit in: (1) Problems with posting a short message on multi-licensing? (2) Problems with a bot posting messages on user talk pages? (3) Other concerns? Ram-Man (comment) (talk)[[]] 19:14, Dec 16, 2004 (UTC)

Since I commented above I'd like to mention that I find this much more reasonable. I'm assuming you'll wait at least 5-10 hours between batches? anthony 警告 21:32, 16 Dec 2004 (UTC)
Yes that sounds like a reasonable amount time, although I may wait 24 hours to make sure more active users have a chance to respond (and so I can sleep). After that time, responses usually slow to a manageable trickle now and then. If this doesn't work out for some reason, I can always repost the request here. Ram-Man (comment) (talk)[[]]

transactions per minute

Having recently started the rambot, it comes to little surprise that someone is unhappy. I've gotten a complaint from User:Docu stating that I have been violating the 6 transactions per minute rule specified on the Bots page, which is true. As requested, I am bringing it up here so that the "rule" might be "changed". For the record, the bot does not violate any of harmless, server hog, useful, or approved. Since the transaction rule was at best discussed based on IRC discussions that are not logged, I have no way of knowing the precident for it. So I will try my best. There originally were three compelling reasons to limit bot transactions: 1) Server Load, 2) the Recent Changes was too cluttered otherwise, and 3) to allow time for other users to verify the changes made by the bot. #1 never existed as the bot's edits represent a tiny fraction of the total server load. #2 was fixed by the implementation of the bot flag which was implemented BECAUSE of the rambot. #3 applies to those bots with small data sets or those data sets that vary greatly. The rambot's data set is so large that adding delay makes for an unmanageable amount of time to complete even very simple tasks. When I move to the cities (from counties), the amount of time will increase by more than 10 times. What I am saying is that no one is going to check 2,000 let along 35,000 articles, so #3 is not compelling. What does not change is that people can randomly sample the data (as I do when constantly monitoring the bot run). And the user's contributions can easily be checked. If there was a lot of variability in the changes, then there MAY be a reason to check more of the entries, but still no reason to slow down. Wikipedia is not about strict rules but an ever-changing evironment that adapts. In fact, aside from NPOV, there are not many hard and fast rules at all. I will make all efforts to enforce accuracy, but what were talking about is the difference between a week of editing vs. 3 weeks of editing. Maybe 2 weeks of time is meaningless to a lot of people, but not to me. If it makes everyone feel better, I can run three bots from three different IP addresses on three different data sets, and that would technically not violate any of the rules. The point being that the rule makes no sense out of context. I'm not even suggesting that we change the guideline. As a general guideline it makes perfect sense, but as we mention at the start of Wikiprojects, these things are what a group of users got together to work on, and they are not hard and fast rules. -- Ram-Man 03:30, Nov 10, 2004 (UTC)

As one of several users (Angela, Guanaco and Tim Starling come to mind) who expanded those guidelines this spring, I would have no problem with your going faster. However, that's at best a personal opinion. And yes, some of the discussion happened on IRC. However, I had to repartition since then, so I don't have logs. Pakaran (ark a pan) 03:55, 10 Nov 2004 (UTC)
If there is no technical problem, I wouldn't mind D6 going faster either. -- User:Docu
The guideliness already make sure that a bot is to perform a subset of changes at a very slow speed to allow everyone to verify that it is working. After that point it is permitted to move to the faster speeds. The rambot does not perform any pipelining, but it could be modified to perform multiple changes in parallel. To date I have not done this, but I have considered it: I don't know, technically speaking, how fast is too fast, but doing things in series is never going to cause a performance hit. I've arbitrarily drawn the line at doing things in parallel, but I suppose a reasonable amount of parallelization couldn't hurt performance. -- Ram-Man 04:23, Nov 10, 2004 (UTC)
6 transactions per minute?! That's all that's currently allowed for bots that have proved themselves? That's crazy. That is absolutely nuts. I'm still fine-tuning the link suggester, and intending to do a number of trial runs (each run a bit larger than the last), so as to have a progressive phase-in and give time for feedback to be incorporated. However, in a full test run there were 210931 pages with suggested links. Now, at 6 transactions per minute running non-stop that'll take 35155 minutes = 585 hours = 24 days to upload. That's nearly a month! Now, I don't have any problem at all with going slowly at first whilst something isn't proven, but going so slowly after that's already happened just seems silly. Surely there needs to be some latitude given to bots that are proven, and that the vast majority of people don't have any objection to, and that would actually benefit from being able to go faster (e.g. currently RamBot and D6, and hopefully at some point in the future the link suggester). - Nickj 05:04, 10 Nov 2004 (UTC)
I don't see the point with adding these link suggestions to the talk pages. This information would be more useful as maybe 100-200 pages organized by topic and linking to the article pages. Additionally, you could offer the actual database of suggestions for those of us who would know what to do with such a thing. anthony 警告 13:30, 11 Nov 2004 (UTC)
Such a short comment requiring such a long answer ;-) Here goes!
The idea of adding suggestions to the talk pages is that:
  1. They're suggestions - a user may not like some of them - and so the actual article should never be automatically modified. People feel strongly about this, and I agree with them.
  2. By adding suggestions to the talk page, those suggestions are automatically associated with the page, and visible to anyone watching the page, all without the problems inherent in modifying the page.
  3. Currently the link suggester shows suggested links from a page, but I also want to show suggested links to a page (but only if there are outgoing links as well).
The problems with listing suggestions separately as a big series of list pages are that:
  1. Point 2 above is lost.
  2. It would be a huge number of pages, much much more than 100 or 200. With 210931 pages with suggestions, at around 3.8 suggestions on average per page, that's over 800,000 suggestions in total. My experience with exactly this kind of process with making lists of changes (with creating the data for the Wiki Syntax Project) has shown that 140 suggestions per page (includes link to page, word that can be linked, and context of the change) is the perfect size to get the page size just under the 32K suggested maximum page size. 800000 / 140 means there would be 5715 pages.
  3. Then people have to process those 5715 pages. As it happens, I've also had experience with asking people for their help in processing changes listed on pages (again as part of the Wiki Syntax Project), and it has taken a lot of effort by a lot of people to get through just 26 pages of suggested changes in around a week. At that rate of progress, with the same consistent and concerted effort (which would probably be very hard to do over such a long-haul project), it would take around 220 weeks to process all of the suggestions, which equals 4.2 years.
  4. Point 3 above is lost unless you have twice as many pages (a list to, and a list from), which would be 11430 pages.
  5. Over such a long period of time, with the rapid pace of change in the Wikipedia, the data would age very rapidly, and quickly become irrelevant to the content that was actually on the pages at the time a human got around to looking at the suggestion.
  6. In other words, the most viable approach appears to be to distribute the problem of processing the links out to the page authors / maintainers by putting those suggestions on the talk pages (and this act of distributing the problem is one of the most fundamental reasons why the Wikipedia is so successful). Of course, page authors can ignore the suggestions (personally, I wouldn't, if I thought that they were good suggestions, and if I cared about the page) - and with the tests, sometimes people ignore the suggestions, and sometimes they use them. Of course if you ignore the suggestions, then you're no worse off than you were before.
Re: Making the database available. I have some very paranoid questions about this:
  1. What are you going to do with it that's different from what I'm going to do with it?
  2. If you've got a good idea, let me know, I'm perfectly happy to give credit where credit is due.
  3. If you do something that doesn't work, or that causes problems, how can I be sure that this won't reflect badly in any way on the link suggester project?
I know those questions seem paranoid, but there's a very good reason for me to feel paranoid about this data: People can be extremely suspicious of and prejudiced against bots, and it would really annoy me to have made an effort, and then have it tarnished or the project blocked based on what someone else did using data generated by the link suggester. All the best, Nickj 23:35, 11 Nov 2004 (UTC)
Thanks for the answer, sorry I didn't respond much sooner. Makes a lot of sense, you've obviously thought this out.
Regarding what I'd do with the database, I'm really not sure, but I should note that I currently run a Wikipedia mirror/fork. Maybe I could find some patterns and apply a subset of it directly to my copy of Wikipedia. Maybe I could make a 3D image of the suggested links. Maybe I could give statistics on the articles with the highest/lowest % of links/suggested links. Maybe I could sort the list by number of links to/from a certain term, with or without filtering by type of link (a person, a place, etc). I wouldn't use it to run a bot on Wikipedia, if that's your concern. anthony 警告 21:44, 16 Dec 2004 (UTC)
Wow, someone with a data set larger than mine! Woohoo. I only have 35,000 - 40,000 articles to check :) By "Transactions" I was referring to "edits". For every one edit the rambot does, it requires a number of reads. But it seems like no one cares if we increase or eliminate the speed limit for proven bots. I suppose not even 120 edits per minute would be a problem, but I don't know what the maximum would be. Without delaying, I only average about 15 edits per minute. -- Ram-Man 12:43, Nov 10, 2004 (UTC)
So when the system is slow, you still want to ram through your aotmated edits when a human has to wait over a minute for a page save due to system response? Don't you think humans should get better response times than bots, especially fully automated bots? When the system is having a good response time then I don't see a problem in allowing bots more leeway in how much load they put on the servers but I still think system response time should give precedence to humans rather than to the bots when the site slows down. Fully automated bots should, for the most part, only run during excess capacity times IMHO. Perhaps a system load monitor could auto throttle bot marked accounts when a certain threshold is reached? RedWolf 08:03, Nov 11, 2004 (UTC)
I'm not particularly suggesting that if the system is slow that a bot should try to hog the system. In fact that would violate the server hog principle and be in violation of policy. The fact of the matter is that even when Wikipedia is running slow we have been told by developers that the serial bot edits are negligible and do not constitute a server hog. This is a common false perception and no developer to date has changed this and the hardware has VASTLY improved since that statement was made. In fact when Wikipedia is slow, the rambot slows down as well. Nevertheless, if it could be proven that bots were causing a slow down, a system load monitor/auto throttle of some sort would be great and I would wholeheartedly support that. I havn't done it yet, but I was going to make a page to control the rambot's settings that it would periodically monitor so that it's speed could be adjusted on the fly from Wikipedia itself. In the worst case, however, a bot can be temporarily banned if causing trouble. -- Ram-Man 13:01, Nov 11, 2004 (UTC)

My only concern with changing the rule, and this doesn't really apply to Rambot, is that going over 6 transactions per minute makes your changes really hard to revert. Thus we start getting into a technocracy where whoever has the faster bot wins. Yes, there is an approval process for bots, but there really isn't that much interest in it. I'd favor approving an increase in the speed limit for rambot, for this particular run. I'd also favor allowing others to receive an increase for a specific run which is extremely well defined. But I think this should require a specific proposal which is approved by at least say 10 people and after the bot has already run in slow mode for a day or so. As for the speed, 120/minute sounds a little high. I'd want some input from a few developers before going over 60/minute or so (which I believe is the read-only speed limit in the robots.txt file). The details could be worked out, but that's my suggestion rather than eliminating the rule. anthony 警告 13:27, 11 Nov 2004 (UTC)

Making your changes hard to revert doesn't happen at 6 transactions/minute. Not speaking for other sysops, but I use Firefox under Linux, and I can middle-click pretty darned fast. Anyhow, isn't there a limit in how many pages any IP can hit, around 1 per 2 seconds? Pakaran (ark a pan) 18:16, 12 Nov 2004 (UTC)
6/minute may not be the exact cutoff, but as a bot can run 24 hours a day I think it's quite enough before we should require consensus for a specific run. I'd say 8,600 a day is probably too high. I'd prefer capping bot edits at more like 1,000 per day unless there is a clear consensus for a well-defined run. If the per day limit were set at 1000 we could up the per minute limit. Maybe 60/minute off-peak? I dunno, whatever Jamesday wants for the per minute limit :). anthony 警告 05:29, 15 Dec 2004 (UTC)

BTW, looking at this page I don't think D6 is yet a good candidate for having the speed limit raised. anthony 警告 13:33, 11 Nov 2004 (UTC)

Two more comments on rambot. I'm hesitant about adding "WikiProject boilerplating to the talk page of the state's cities." I don't think it's useful, and I think it's harmful, as it suggests that there is discussion on the talk page when it's really just a spam link to someone's project. Secondly, I'd like to see the details of the "automatic disambiguation". This is a very hard thing to do right. anthony 警告 13:36, 11 Nov 2004 (UTC)

First, I have not done any wikiproject boilerplating partially because some people don't like the idea. It has already been done for the U.S. County articles, but I stopped at that. Nevertheless, it IS just the talk page and I don't believe it is spam, as Wikiprojects are quite optional and used all over the place. I wouldn't mind renewing a discussion about this, but I don't know where to bring it up. It wouldn't be hard to make a simple Wikiproject template that is short and concise and doesn't take up too much space. -- Ram-Man 14:55, Nov 11, 2004 (UTC)
Perhaps I used a bit too harsh a term calling it "spam", but at least one problem I have is that it makes the "discussion" link blue, which makes me think there's a discussion, but it's just a link to a project which I already have heard about many times. Personally I wouldn't have a problem if this were only done to discussion pages which already existed, but I don't know how useful you would find that. The best place to start the discussion would probably be here, but since people have already objected I think it's your responsibility to either show that the objectors have changed their mind or show that there is such overwhelming support for this that we should ignore the objections. anthony 警告 17:38, 11 Nov 2004 (UTC)
Second, "automatic disambiguation" is vague, but much of this has already been done. Some cities exist at "City, State", but the "City" article may not exist, so it could be a redirect to "City, State". But maybe there are more than one "City" with the same name, so the "City" article (if it does not exist) should be a disambiguation page to all of those cities with the same name. The process can be taken a step up too. "City, State" may be disambiguated to "City, County 1, State" and "City, County 2, State", etc, if the page does not already exist. In the time that I have not done this, some of this has already been done, but I'm sure some have been missed. Does this help to explain it? I'm not ready to perform this kind of task yet. Oh, and it used to include the "City, ST" -> "City, State" redirects too, but most of these have been done as well. -- Ram-Man 14:55, Nov 11, 2004 (UTC)
Ah, yes, this sounds reasonable. I thought you were trying to automatically disambiguate links or something. You might want to phrase "Disambiguate duplicate pages like..." as "Create disambiguation pages for articles like..." anthony 警告 17:38, 11 Nov 2004 (UTC)

With regard to the bot speed limit, the best solution to hundreds or thousands of bad bot edits is to do another bot run to clean up. In the very worst case, a generic "revert bot" can just undo all the bad bot's recent edits. With my list of 50,000 changes, what I did was run the first dozen or so, wait a day, check for comments, and only when everything was settled, let it run unattended. It has taken days or weeks for people to find some systematic errors, which I will be fixing myself with some cleanup runs. I don't think the speed limit really helped much. It just means that I have to check my bot every day to make sure it's still running, which is time I could be spending readying subsequent runs or editing articles. Sometimes it auto-detects systematic errors, and the speed limit actually creates a delay in me noticing that. Articles also get edited during the course of the run, which can cause some inconsistencies, above and beyond the fact that a long run leaves some articles one way and others a different way, for a longer period of time than a short run. It's also something of a waste of time for human editors to be running after a bot to fix bad edits as they happen, when bad edits could be reverted en masse automatically. So I don't think having humans be able to keep up with the actual editing process is necessarily a good reason for a speed limit. If something has gone wrong, it should be just as easy to deal with after the fact and while it's in progress.

Perhaps an official, community-approved "revert bot", which could be deployed on short notice, would make sure that's the case.

In short, I think restricting bots to sequential edits is sufficient, as long as the number of people running bots (and the number of bots per person) is small compared to the Wikipedia population (or, more directly, server capacity). My bot automatically stops editing if Wikipedia takes too long to respond, on the general theory that load is probably too high, or that something else has gone wrong. But if even immediate sequential edits have a negligible performance impact, raising the speed limit will probably improve human productivity. Bot authors need to supervise their bots less, and human editors will be not be making changes that a bot was going to get around to anyway, or that a slow-running bot will later come by and obsolete.

It should be a stated rule that bot owners shall not knowingly make ongoing, conflicting edits with one another. We have the three-revert rule to prevent humans from operating on the "fastest mouse wins" principle. Given the potentially large number of articles involved, I think there's good reason to have a one-revert rule for bots. The idea is that if a bot owner would like to change or revert what another bot owner has done, they should get community approval in the appropriate forum. Which they really should be getting anyway, but perhaps an explicit rule would make people more comfortable. I think this issue is rather orthogonal to the speed limit issue. -- Beland 06:34, 17 Nov 2004 (UTC)

I agree with everything just said almost exclusively. I must say that many times errors have been found in rambot articles, sometimes months later. A simple cleanup bot run has worked every time. Most decent bot owners are upstanding and will take full responsibility for any errors and clean them up themselves. -- Ram-Man 14:19, Nov 17, 2004 (UTC)

I have stayed out of this conversation for the most part. However, I would like to note that I agree with User:Ram-Man and User:Beland. The reality is that there are actually very few of us people running actual bots. I think that for the most part we respect the other people and there is little chance of a turf fight. For instance, I was thinking about doing another run of User:KevinBot on the Rambot articles, but when I learned that the Rambot had been dusted off, I shelved the idea. I would also like to note that if a bot does make mistakes, I agree that a subsequent bot run can clean it up, and that for the most part the bot authors are fairly well upstanding types and will clean up any unintended messes. I also agree that the throttle rule for bots is obsolete and should be done away with unless it can be proven that the bots are slowing the servers down to something more than a neglible amount. Kevin Rector 04:11, Nov 25, 2004 (UTC)

Ram-Man, you've severely under-stated the load issue from bots. It is not a trivial portion of the load at the limiting rates given in the current policy. The site currently sees about 25 write "queries" per second average, perhaps twice that at peak times; about 250,000 edits/moves/whatever per week total for the English language Wikipedia. Limiting write rate, no reads at all, is perhaps 100-150 write queries per second for the main service database servers (depends on the operation, edit saves are more costly than many). Limiting write rate for one of the backup slaves is about 25-35 writes per second and for another a bit higher. One edit save involves anywhere from about 6 to thousands of write queries (relatively few are that costly) immediately and perhaps 5-10 later when the first reader comes along. Call it about 8 per save on average for immediate effect. One bot at the current limit of 6 per minute for 8 hours a day can do 20,160 edits, about 10% of the total for the English language. Odds are that those will be happening at the busiest times for the site and will have a greater negative effect than the count implies because of their timing. Each of those changes also flushes the changed page from the site caches, a significant apache web server and squid cache server load factor.

You'll need to find a better way to do this, one which, at a minimum, can be run at off peak times. Jamesday 04:51, 13 Dec 2004 (UTC)

Unfortunately, I missed this comment in the entire document! Oh well, now I've seen it. BTW, 6/minute * 60 minutes/hour * 8 hours/day = 2,880/day, not 20,160, which is a week. That's a long time to do that many edits. The last few bot runs would have taken a substantially longer period of time because they consisted of 60,000 or so edits doing a number of tasks. That's three weeks of doing nothing but the bot for 8 hours a day. I just can't dedicate that kind of time and expect to get things done, but I suppose if there is a physical reason I can't there is nothing I can do about it. But of course all this assumes ideal circumstances. In reality Wikipedia slows and speeds up and during the time I edit (quite probably peak time) quite often the bot is only able to edit at 5 or 6 per minute, sometimes slower. When Wikipedia is more responsive, the bot naturally is able to edit faster, just as any regular user would be. Thank you at least for explaining this situation to me! But I hope you see my dilema, as I work with a very large set of articles (about 35,000). But even in the worst case scenario, the bot represents at most 2% of maximum writes (assumes 100), more often it would be less than 0.5%, as I rarely get more than 1 edit every 2 seconds without throttling. Of course I don't know what affect the read activity skews the results. I suppose the only way to speed this up is better hardware? Ram-Man (comment) (talk)[[]] 18:08, Dec 13, 2004 (UTC)

About 250,000 changes per week for en wikipedia. 20,160 per week for one bot running 8 hours a day a the current policy limit of 6 per minute is 8% of that. Run the bot 24 hours a day and it's 24% for one bot. Run 8 hours a day at 2 seconds per edit and it's 100,800 or 40% of the weekly edit count. At present there are 19 accounts with the bot flag and were 234,343 operations in the preceding 7 days. The actual number of those edits performed by bot-flagged acounts was 10,524 or 4.5%, broken down as follows:

+----------------+------------+
| count(rc_user) | user_name  |
+----------------+------------+
|           1634 | Rambot     |
|              9 | Robbot     |
|            279 | Guanabot   |
|           8065 | CanisRufus |
|            121 | Janna      |
|            416 | Pearle     |
+----------------+------------+

For about 10 hours a day the systems are operating within about 15% of the highest load typically seen on any given day (based on the Squid stats linked from Meta:Wikimedia servers). Seems unlikely that a bot running at a significant rate when the system is within 15% of peak Monday load (the busiest day, typically) isn't doing harm. Load decreases steadily during the week and by Friday the load has dropped substantially. Saturday is generally pretty quiet, an ideal day for bots to run - peaks may be as low as 700 requests per second, while without Apache CPU limiting slowing things down Monday can be peaking a over 1,100 requests per second. Of those requests, about 78% are served from Squid cache. How did the bot operators do at avoiding peak times? Here are the figures for Monday 1300-2300, the times when the system was within 15% of max load:

+----------------+-----------+
| count(rc_user) | user_name |
+----------------+-----------+
|              5 | Janna     |
|             27 | Pearle    |
+----------------+-----------+

And for the whole week:

+----------------+------------+
| count(rc_user) | user_name  |
+----------------+------------+
|            159 | Rambot     |
|              1 | Robbot     |
|              8 | Guanabot   |
|           1214 | CanisRufus |
|             30 | Pearle     |
+----------------+------------+

Through chance or design the bot operators did avoid the worst response time period on the busiest day of the week but the bots in use didn't avoid the busiest times on the rest of the week.

One well and visibly monitored factor is apache web server CPU load. The ganglia stats linked from the servers page wil tell you the percentage of CPU use on those servers. If that CPU use is 90%, response time is very significantly affected; buy more threshold is unofficially set at 60% at peak times and max comfort level is about 85% for load which can be shed easily. As you can tell (if the charts are up again after last night's work), now is usually a bad time to be running bots.

The 25-35 updates per second database slave is no longer a factor. It's now the primary upload/download server for the site. New slowest is about 40-60 per second, to a pair of 250GB 7200RPM SATA drives in RAID 0 with about 300MB of RAM allocated to database duty.

Avoiding the times I've mentioned here is a good way not to be noticed. Jamesday 13:12, 14 Dec 2004 (UTC)

The current CPU load and other factors affecting system load should be periodically written (every 10-15 minutes perhaps) to a file in a well-known location accessible to bots, similar to robots.txt. The bots could then monitor this file as it runs and reduce activity as the system load increases and increase activity as system load decreases. At a certain threshold, perhaps 90% of load, bots could even put themselves into a hold until load backs off below the threshold. If this option is undertaken, the pywikipediabot core code should be updated accordingly. RedWolf 05:57, Dec 15, 2004 (UTC)
I agree, the ideal solution would be to have a bot be able to monitor server load in some fashion. However, I would have no problem running the bot on Fridays and Saturdays exclusively until such a time as hardware permits. Ram-Man (comment) (talk)[[]] 19:50, Dec 16, 2004 (UTC)

Resolution

So it sounds like the concern that sequential edits with no delay will cause excessive server load has been addressed with the fact that our current hardware makes this kind load negligible. Concerns about runaway bots making bad or conflicting changes can be accommodated with a simple rule and the ability to call out a "revert bot" after community approval, respectively.

Therefore, I propose changing the policy from:

  1. Bots should wait for 30-60 seconds between edits until it is accepted the bot is OK, afterwards waiting at least 10 seconds between edits after a steward has marked them as a bot

to:

  1. Unmarked bots (those not approved by Wikipedia talk:Bots) must wait 30-60 seconds between edits. This is partly to allow human editors time to notice any bad edits before too many are made.
  2. Marked bots (those approved by Wikipedia talk:Bots and marked as a bot by a [[m:stewards|steward]) may make sequential edits without delay but may not parallelize requests. This is to prevent excessive server load.
  3. Bot operators sometimes make mistakes or unpopular decisions. A large number of "bad" edits can easily be made faster and more conveniently than human editors can revert them. If you discover a bot making bad edits, the first step is to leave a personal message for the owner of the bot on their user page. Most bot owners are responsible and will stop damage in progress and usually clean up their own messes. If a bot owner is unable or unwilling to undo a large series of "bad" edits, please post a request for discussion and assistance on Wikipedia talk:Bots. At worst, someone will have to run a "revert bot" to undo all of the "problem" bot's edits back to a certain point in time. There's certainly no need to waste human editor time doing a manual cleanup of a bot's mess when an automated solution is available.
  4. Bots should not revert the work of other bots without consent of the original bot operator or community approval. If you are planning on reverting the work of a bot, post a note on Wikipedia talk:Bots. Please reference any previous community discussion (like on a category or article talk page) which justifies the revert. This is a "one-revert rule" [shouldn't that be "zero-revert rule"?] for bots instead of the usual "three-revert rule".
  5. If you see bots making conflicting edits, notify the bot owners and Wikipedia talk:Bots. In this situation, bot owners should stop their bots and discuss the conflict with each other and/or the affected community.
  6. It's OK for bots to build on one another's work. If you are planning on doing this, you should drop a note or a pointer on Wikipedia talk:Bots for two reasons. One, this page should be getting notification about all novel bot projects. Two, the original bot operator may have constructive comments or may even be able to accomplish the same or better goal more easily with their own bot.
I'd support this, if this brought to a vote. As for parallelization of requests, I'd venture that that might be allowable within limits, eg a bot running off a home DSL line with limited upstream BW is unlikely to cause too much trouble even if it's using 2 or 3 threads. Pakaran (ark a pan) 01:07, 5 Dec 2004 (UTC)
The limiting factor is how fast disk heads can move, not the connection speed. That's something under 200 seeks per second per disk and a page change is likely to involve many of them (though caching tries to reduce the effect). It is not that hard for a dialup line to do a significant DOS on the site until it's been firewalled. Jamesday
In theory the only immediately required disk writes would be to the journal so the seeks would be rather small, but who knows if this is true in practice. anthony 警告 05:32, 15 Dec 2004 (UTC)

Adoption of resolution

If there are no objections by the beginning of 11 Dec 2004, let's declare the above resolution as policy. If we don't have complete consensus, then we can start tallying up votes one way or the other, and consider amendments as needed. -- Beland 04:18, 6 Dec 2004 (UTC)

Debate extended to the beginning of 18 Dec 2004 to allow people from the village pump to wander over. (Posting a note there now.) -- Beland 00:04, 13 Dec 2004 (UTC)

Thanks for that note at the pump. Jamesday 04:51, 13 Dec 2004 (UTC)
  • Works for me, should we put some sort of message on the pump so that people not so connected to the conversation could have some input? Or should we assume that all the people who care about bots would be watching this page? I don't know just asking. Kevin Rector 19:22, Dec 6, 2004 (UTC)
  • I obviously approve Ram-Man (comment) (talk)[[]] 13:07, Dec 9, 2004 (UTC)
    • Change to Object while the transaction speed issue is hashed out.
  • Object, if only to keep it from being "automatically" adopted. I'd like the system administrators/developers to comment on this proposal. -- Netoholic @ 17:50, 2004 Dec 9 (UTC)
  • tallying up votes → Support. -- Nickj 22:42, 9 Dec 2004 (UTC)
  • Support – There does not seem to be any real reason for the 6 edits/minute rule, after reading the above. – ABCD 01:15, 10 Dec 2004 (UTC)
  • For the record, I support. -- Beland 00:04, 13 Dec 2004 (UTC)
  • The Wikimedia robots.txt file specifies a rate of one operaton per second at most. That limit is not low enough to prevent disruption from bots which write - it's a read limit. Bots not obeying the robots.txt limits should expect to be firewalled if noticed... because if noticed it's likely to be because they are causing disruption. Jamesday 04:51, 13 Dec 2004 (UTC)
  • I object. Requiring community approval to revert the changes made by a bot is unacceptable. The only bots making massive numbers of changes should be those whose specific actions were approved by a consensus. I'd favor lifting the per minute restrictions if we added a relatively low per-day restriction. This per-day restriction could be waived for specific well defined runs, but each and every run should require a separate waiver of the restriction. Wikipedia should not be a technocracy. anthony 警告 05:24, 15 Dec 2004 (UTC)

Sandbot

I am proposing a 'sandbot', a simple robot which would purge the contents of the sandbox automatically every six hours. It could also be used to re-paste the sandbox header code back into the 'box if it gets dleted by accident or whatever. -Litefantastic 15:51, 11 Nov 2004 (UTC)

It sounds interesting, especially the part about the sandbox header code. However, what if someone made an edit 1 second before the 6 hours expires. Their change would be lost immediately. I'd prefer it if the bot would only run if the box has been idle for a certain period of time by checking the page history. Thus after 6 hours the sandbox becomes idle for 5 or 10 minutes you can perform the revert. -- Ram-Man 16:46, Nov 11, 2004 (UTC)

Agree with Ram-Man about using idle time in addition to the six hour limit. I'd even support up to every hour I suppose. I've noticed the sandbox message isn't readded very often any more. I thought we had abandoned it. anthony 警告 17:42, 11 Nov 2004 (UTC)

Okay, so it would be set for a delay period of... how about five minutes of inactivity? Or ten? -Litefantastic 18:15, 11 Nov 2004 (UTC)
Only purge the contents rarely after long idle periods, but write a bot to re-add the header if needer by checking every few minutes. Cool Hand Luke 08:47, 10 Dec 2004 (UTC)

Why not delete certain rude words (i.e. f***, n*****) too? Bart133 18:33, 15 Jan 2005 (UTC)

Only if we're going to censor the rest of Wikipedia as well. --Carnildo 00:55, 16 Jan 2005 (UTC)

Tkbot

I would like to run pywikipediabot's interwiki.py (unmodified code) manually as Tkbot, for the purpose of cleaning up interwiki links on articles I've created or edited. —Tkinias 04:49, 5 Dec 2004 (UTC)

Bot for adding "missing" Redirect and Disambig pages - should it be done?

I've been thinking about a way of creating some "missing" redirects and disambig pages. There's some info about it here, but the short version is that people had to go out of their way to link a bit of text on something other than just the straight text itself - i.e. the link target and the link label differ, and the label has no page currently, but the target does. Using that information, we can add redirects (where all the links agree), or disambiguation pages (where they don't).

Using this method, I've made example lists of "missing" redirects and disambiguation pages:

Redirects:

Disambig:

Notes:

  • This is currently just a proof-of-concept. The actual lists that would be used would not be these exact lists, and could be filtered a bit more, redirects that have been added since removed, etc. The lists are just here to provide something concrete for you to see to illustrate the idea.
  • The number of times something has matched a possible redirect or disambig is shown by the number to the right of each link in the above lists. I can change the cut-offs for these numbers if and when required, so if they're too low let me know.
  • If there are specific things that you think should be removed / filtered from the lists, then that can be done as well. Just give me some kind of regular expression or wildcard syntax that tells me what you don't want link to or from - to give you some examples of these, here is the query file I'm currently using to remove unwanted redirects and disambiguations.

Now, the question I had is this: Do you think auto-creating redirects like this is a good idea? and Do you think auto-creating the disambiguation pages like this is a good idea? Would it help? If the consensus is that it's a bad idea, then I'll drop it, but if the consensus is that it's a good idea, then I'll make a simple bot that does this.

Also to give you a ballpark idea of the amount of new pages that we're talking about, in this dataset it's:

  • Number of redirect pages: 13571
  • Number of disambig pages: 1162

All the best, -- Nickj 09:05, 2 Dec 2004 (UTC)

Looking at just the first page of suggested redirects, it looks like it's giving a large number of inappropriate suggestions, such as:
  • 25 mm → 25 mm caliber -- could easily cause unexpected links.
  • 1963 election → Canadian federal election, 1963 -- will cause unexpected links once we start getting articles on other 1963 elections.
  • 102nd → 102nd United States Congress -- there's no logical reason for the redirect to point there.
  • '50s → 1950s -- if the Wikipedia is still around in this format 50 years from now, "'50s" will be ambiguous.
  • 586 BCE → 580s BC -- can easily cause problems if someone wants to link to 586 BC.
As a side note, it does seem to have caught an interesting misspelling:
  • Adolf Frederick → Adolf Frederick of Sweden 6
  • Adolf Frederik → Adolf Frederick of Sweden 20
--Carnildo 00:58, 4 Dec 2004 (UTC)


Thank you for your comments. I was starting to wonder whether anyone was going to reply! On those inappropriate suggestions:

  • 25 mm → 25 mm caliber -- could easily cause unexpected links.
    • But isn't the gun what people mean when they write 25 mm and make it a link?
  • 1963 election → Canadian federal election, 1963 -- will cause unexpected links once we start getting articles on other 1963 elections.
    • Isn't that true of any redirect? That extra articles can appear, which mean a redirect becomes a disambiguation page? Nevertheless, I'll get any redirects from "% election%" (%=wildcard) to anything else removed.
  • 102nd → 102nd United States Congress -- there's no logical reason for the redirect to point there.
    • Agreed. I'll get any redirects to "% United States Congress" removed from suggested redirects.
  • '50s → 1950s -- if the Wikipedia is still around in this format 50 years from now, "'50s" will be ambiguous.
    • Then in 2050 this can become a disambiguation page ... I'll be 76 then, so personally I'll probably be more worried about getting my colostomy bag changed :)
  • 586 BCE → 580s BC -- can easily cause problems if someone wants to link to 586 BC.
    • Yes, but 586 BC (note no 'E') currently is a redirect to 580s BC. In other words, if you object to 586 BCE redirecting to 580s BC, then logically shouldn't you also object to the current 586 BC redirect to 580s BC ? ;-)

All the best, -- Nickj

Redirects should not be created for plural forms of most words. Perhaps, only create these links when references exceed a certain threshold? e.g. 10 RedWolf 03:54, Dec 4, 2004 (UTC)

But wouldn't this Wikipedia naming convention mean plural redirects are OK? All the best, -- Nickj 04:03, 4 Dec 2004 (UTC)

It certainly seems like a good idea to me to suggest redirects/disambigs for humans to inspect. (And certainly once there's an approved list, it's much faster if a bot comes along and does the heavy clicking.) It's not unlike what I've been doing with Wikipedia:Auto-categorization and Category:Orphaned categories. This sort of prep works seems to make human editors more enthusiastic and also saves a lot of time once someone does actually get around to completing these tasks. With the latter, I sort by frequency and then alphabetically, which may or not be helpful for your lists. I was thinking that supplying a snippet of context around the broken links might be helpful. If there are like 50 pages that point to the same place, it would be a bit cumbersome to show 50 lines just for that one page, but for 5-10, if context were included it might be a lot faster to verify that all of the articles were, in fact, referring to the same entity. It would make the pages longer, though. -- Beland 07:21, 4 Dec 2004 (UTC)

It's a good idea, but I feel there are too many false positives for it to be done by a bot. I think a better solution is post your very useful lists somewhere prominent and given time Wikipedians will work through them by hand. - SimonP 08:26, Dec 4, 2004 (UTC)
OK, the idea of automatically adding these by bot is dropped, but the lists have been recreated and tidied up, and placed on this Missing Redirects page, together with instructions. Additionally, I've added an "Easy Preview" function (which uses HTTP GET vars), so that these can be added without typing or copying/pasting anything at all. I've also incorporated the suggestion of sorting by frequency and then alphabetically by target. Hopefully this should provide a good solution of software doing the laborious stuff, but humans deciding whether the suggestions are any good or not. All the best, -- Nickj 05:57, 14 Dec 2004 (UTC)

Trivial daily-update bot

This is not really the type of bot that you're worried about, but to be virtuous I'll list it here: I'd like to automate the once-per-day update of the WikiReader Cryptography Article of the Day box (an editor-oriented template not used in main article space), which is currently being uploaded by hand. The script would run once every 24 hours under User:Matt Crypto (bot). — Matt Crypto 14:44, 14 Dec 2004 (UTC)

Support. There are plenty of users here who would whine if you didn't ask. Ram-Man (comment) (talk)[[]]
Not an objection to this one. Anyone contemplating such daily article templates should know that a change in the template purges all of the pages which contain it from both the Squid and parser caches. We rely on the Squids to serve over 75% of all hits and raising that even to 80% would cut the Apache web server capability need by perhaps 15-20%. That makes putting them on large numbers of pages containdicated if performance is a concern. Better to put an unchanging ad on the pages and let people choose to add them to a sub-page of their talk page so they can keep an eye on their favorite things. These also tend to foul up the most wanted articles report but they aren't popular enough yet to be a major pain there. Not a reason not to do something really good but it's a factor in choosing how best to do things. Jamesday 13:37, 17 Dec 2004 (UTC)
From what I'm reading of this, I should probably take {{opentask}} off my user talk page and just link to it? anthony 警告 14:29, 17 Dec 2004 (UTC)

Friday and rambot

Since today is friday and tommorow is saturday (typical low server load times) and no objections have been stated to the adding of UN/LOCODEs and adding the map templates, I plan to start immediately assuming I can get the rambot programmed in time, rather than wait until next weekend for the next "downtime". It would be nice to have a category which lists those cities that have LOCODEs attached to them, but that would be a category with a few thousand entries. Should I just hold off on this? They can always be added later, I suppose if some better system is worked out. Update: See here for an example of some of the current work being done. Ram-Man (comment) (talk)[[]] 20:58, Dec 17, 2004 (UTC)

Good luck! Constafrequent, infrequently constant 05:14, 18 Dec 2004 (UTC)
That new map template is very cool. Obviously, Australia needs one too! So I've copied it, and added it to an Australian suburb. A big thumbs-up to adding this to the RamBot articles. All the best, -- Nickj 04:36, 20 Dec 2004 (UTC)

Request permission for a bot

I'd like to request permission to run a bot, User:DanBot, using pywikipediabot, whose primary purpose will be to make corrections to the various highly similar articles containing Formula One statistics. First, it will convert a bunch of navigation tables (such as the one at the bottom of 1990 Australian Grand Prix) to templates (such as the one at the bottom of 2004 Monaco Grand Prix. The bot's other tasks will be similar tweaks to this limited body of articles. Dan | Talk 04:25, 20 Dec 2004 (UTC)

  • Is this bot manual or automated? anthony 警告 12:53, 21 Dec 2004 (UTC)
    • Manual. Dan | Talk 16:06, 2 Jan 2005 (UTC)

I plan on creating a bot to edit my userpage and update my personal statistics once a day during offpeak hours. I may work on other projects too at some point, for instance I have an idea for helping get media files from the various wiki's to commons, however I will test these on the test wiki first, and post my intentions here before I do anything different with the bot. マイケル 14:46, Dec 21, 2004 (UTC)