Wikipedia:Bots/Requests for approval/SquelchBot


--Versageek 02:35, 29 January 2008 (UTC)[reply]


Will the bot edit-war with a user who repeatdly inserts the same blacklisted link? Could it sense that and report it to a central place? MBisanz talk 01:36, 29 January 2008 (UTC)[reply]

  • I'm concerned by this, and don't think we should be running this kind of bot without a technical and policy discussion on the subject. There's potential for rule and bureaucracy creep. Who is to decide which links are useful and which are not? Normally we assume good faith and allow individual editors to edit the project in the way they see fit. If an editor becomes a problem they can be counseled or sanctioned. Whether a link is spammy or not is a case-by-case decision that depends on a lot of subjective factors. It would be hard to simply blacklist certain sites. I've had one or two perfectly good links I added reverted by shadowbot. It's hard to argue with a determined bot - there's no deliberation, no appeal, no judgment. Is there going to be a way to get around the bot if an editor has considered the question, and personally feels the link in question is not spam? If it can be more like the vandal bots, I don't think anyone questions their operation. They're extremely useful and have a low error rate. A really good bot that removes bad links but allows more experienced editors to override it on a case by case basis would be fine. Wikidemo (talk) 01:43, 29 January 2008 (UTC)[reply]

It would help if the operator(s) described, in depth, the function of the bot (interface, behavior, etc.) for those not familiar with how Shadowbot worked. GracenotesT § 01:51, 29 January 2008 (UTC)[reply]

which is what this bot does, it mainly targets new users/annons who add the links and typically only reverts a single time. (there is a method to have the bot revert aggressivly but that is used very very sparingly). Most of the links that the bot targets are crap sites for the most part. youtube is a good example. what fraction of youtube is non-copyvio and reliable? 0.5% maybe? and the bot does report. see #wikipedia-spam-t and #wikipedia-en-spam on freenode. βcommand 01:54, 29 January 2008 (UTC)[reply]
  • It won't revert the same page twice in a row nor will it revert users who are "auto-confirmed", unless a human tells it (via IRC in #wikipedia-en-spam) to 'override' on a specific URL. This is only done when there is an onslaught of persistent spammers for that URL. As operator, my approach is that this bot provides a community service. While those who monitor the bot in real-time may add domains to the blacklist to deal with an active spammer, the community certainly has input into what stays on the blacklist and what gets removed. --Versageek 02:29, 29 January 2008 (UTC)[reply]

I don't support running any antivandalism bot under the name SquelchBot. These edits will already going to be perceived as unfriendly or unwelcoming, and the name will only compound problems. It might as well be called WeDon'tRequireYourServicesBot. The task itself I am not convinced about, but the name makes it impossible for me to support. — Carl (CBM · talk) 03:11, 29 January 2008 (UTC)[reply]

Since it doesn't seem like it will edit war or stop quality editting, I have no objection to it. MBisanz talk 03:39, 29 January 2008 (UTC)[reply]

A bot whose operation depends on off-wiki IRC is IMHO a violation of open wiki discussion of editing DGG (talk) 04:45, 29 January 2008 (UTC)[reply]
All of the anti-vandalism bots depend on off-wiki IRC as well. This is nothing new. AntiSpamBot/Shadowbot had over 44,000 edits.. This is the same bot.. --Versageek 05:13, 29 January 2008 (UTC)[reply]

No, bots are annoying. Granted, I think it's better when they're REMOVING rather than ADDING content, but still... In this case, the spam blacklist can already deal with the problem and it's far easier to keep track of.   Zenwhat (talk) 05:24, 29 January 2008 (UTC)[reply]

Ouch, it uses an off-wiki IRC? No, I'll have to definitely oppose this one. Granted, I admit I'm probably paranoid about cabals, but if we already have a blacklist that can deal with the problem, what's the point of this? What's the benefit about giving a handful of bot-owners the authority to collude on off-wiki IRC, to make up their own badsites list?   Zenwhat (talk) 05:33, 29 January 2008 (UTC)[reply]

The difference between this bot & the Mediawiki:Spam-blacklist is that the Mediawiki list prevents all usage of a URL, a page can't be saved if it contains a URL which is on the Mediawiki blacklist. This bot would only prevent IPs and new users from adding a URL, they would still be able to edit pages containing the URL without being reverted. Auto-confirmed users won't be reverted by the bot and can add links to domains blacklisted on the bot whenever they feel the link is appropriate.. If it would reassure people, I'm sure we could program the bot to output a list of domains on the blacklist to a wiki page on a regular basis, so the content can be reviewed. --Versageek 05:45, 29 January 2008 (UTC)[reply]
Please do that. Without that there is absoloutely no oversight from the community. How about the bot calls the blacklist on a regular basis from a fully protected page, similar in operation to the spam blacklist in that only admins can add and remove but there is community oversight. ViridaeTalk 05:47, 29 January 2008 (UTC)[reply]
I really like Viridae's suggestion - that the contents be listed on a protected page and that the when an admin edits that page, the bot picks up the edit (whether a removal or an addition) and applies it to the filter. --Iamunknown 06:09, 29 January 2008 (UTC)[reply]
First, it is technically possible to create that function as described above, using a fully protected wikipage instead of an off-wiki SQL database (though it is not completely trivial). I'll have a look into that, it is actually an interesting possibility. But I don't think that the anti-vandalism-bots have this option either, as for here, addition of text should be considered to be in good faith, but if in a certain edit certain data is added these antivandalismbots will also revert, without the same community oversight.
Some points:
  • It only reverts accounts which are less than 7 days old, and IP accounts. These accounts are generally don't know too much about the wiki policies and guidelines, especially for the these links.
  • Users and IPs can be (and are) whitelisted when deemed appropriate. Such users are not reverted.
  • It normally only reverts once, if it gets reverted, an off-wiki alert is generated, and a human editor looks into it. There is an option to make it override always for persistency in spamming, but it is not used often.
  • It does not revert when it detects that the link is used inside reference tags or inside a template.
This bot is indeed a step below the spam-blacklist. E.g. a lot of the blogspot links that are added to wikipedia are in violation, or at least questionable, of one of the policies or guidelines of Wikipedia:NOT, Wikipedia:RS, Wikipedia:OR, Wikipedia:SPAM, Wikipedia:COI etc. When blacklisted these links can NOT be used at all anymore, which would be a loss since there is a significant number that is appropriate, which is between having to check every addition by hand (as many editors do), and not being able to use the link at all anymore. I hope this explains a bit more about the working.
We'll have a go at using a wikipage for the blacklisting. --Dirk Beetstra T C 14:32, 29 January 2008 (UTC)[reply]
So with this idea, the bot's blacklist will be maintained on-wiki for any admin to edit? Lawrence § t/e 14:33, 29 January 2008 (UTC)[reply]
See User:SquelchBot/Blacklist. I am testing it now only with blogspot.com .. the problem is that Squelchbot is dependent on the linkfeeder bot, which also needs to access to the blacklist .. which is the non-trivial part of the system ... that one now still relies on the SQL-based blacklist. I'll get to that if this works. --Dirk Beetstra T C 14:42, 29 January 2008 (UTC)[reply]
The part for SquelchBot itself works, adapting the linkwatcher is going to be more of a task. All I can do for that at the moment is keep that list up to date with SquelchBot's list (regexes on that list get only reverted by SquelchBot if the linkwatcher reports it, so it needs bot-operator attention) I have one concern though, the bot is used to catch active spammers (that is, editors who only add links without discussion, it does not always mean that the link itself is bad), which results that links on that list maybe there without a previous on-wiki discussion. I would like some opinions on that. --Dirk Beetstra T C 15:54, 29 January 2008 (UTC)[reply]
I now see that adapting the linkwatchers is going to be difficult (that bot is chewing a high number of page edits, at the moment from in total 722 wikipedia, from which it has to extract the added links and check if they are blacklisted via an on-wiki blacklist). Also, there are several problems with on-wiki blacklisting (which are similar to those for the normal blacklist). Here a faulty regex would bring down the bots (as a faulty regex in the spam-blacklist could make it impossible to edit certain pages or perform certain edits). --Dirk Beetstra T C 16:32, 29 January 2008 (UTC)[reply]

I'd have no problem with this, provided that the following are set in stone and the bot-owners must come back to get approval if they change the following settings:

  • It must only affect unregistered users and accounts less than 7 days old, and that it NOT overturn this based on manual commands given by a handful of users in #wikipedia-en-spam
  • Its "blacklist" be renamed "greylist" to clarify that it is not dealing with blatant spam URLs (which should be blocked completely with the spam blacklist), but "generally suspicious" links used for subtle linkfarming and violations of Wikipedia:SOAP.
  • That its blacklist be a protected page that can only be edited by administrators and can be reviewed by the community
  • That collaboration for this bot be on-wiki, not off-wiki.

Also, an alternative proposal is for this bot to run a similar algorithm, but to allow human beings to make the final decision whether to revert. See User:Zenwhat/Greylist.   Zenwhat (talk) 20:03, 29 January 2008 (UTC)[reply]

Zenwhat, your complaints really dont have much ground. We had a anti-spam bot operational with 40K edits without any real issues. This is a copy of that same bot with a few code fixes. Unless there is something besides the fear of the "CABAL", there are no reasons to change the time tested code. Zenwhat I invite you to join one of the spam channels and see for your self. βcommand
Zenwhat's suggestisn (not complaints) do have ground. In fact, they are quite reasonable. On-wiki collaboration is a good and a desirable thing. An on-wiki page editable by administrators, as opposed to an off-site SQL database edited by someone with little or no oversight, is a good and a desirable thing. Affecting a minimal number of editors is a good and a desirable thing. Links, that ought be blocked completely (or, as said above, "aggressively pursued"), being put on the spam blacklist is a good and a desirable thing. That you disagree with and dismiss them does not mean that they are not good suggestions, or that they will remain unsupported by others. --Iamunknown 22:56, 29 January 2008 (UTC)[reply]
  • All we are doing here is replacing the previous successful anti-spam bot with an identical antispam bot. I see no reason for any drama or disagreement. How about we give this a months trial to check that the code works OK and then review if there are any concerns raised in the meantime? Spartaz Humbug! 20:46, 29 January 2008 (UTC)[reply]
Surely you see reasons (all the above reasons?) for "drama" or disagreement, but you simply do not agree with them?  :-) --Iamunknown 22:57, 29 January 2008 (UTC)[reply]

BetaCommand, this isn't an "anti-spam bot," because you're adding links that are clearly not spam. Blogspot is NOT spam. It just might be used that way. Spam links should be added to the blacklist. As clarified above, you want to create a bot to revert suspicious links en masse and you want to be able to control it through manual commands in an off-wiki IRC.

If the "manual commands in IRC" feature is nixed, allowing community oversight, and "it only affects new users" is set in stone, then I'd have no problem with it.   Zenwhat (talk) 02:10, 30 January 2008 (UTC)[reply]

How is User:SquelchBot/Blacklist not community oversight? GracenotesT § 02:32, 30 January 2008 (UTC)[reply]
It was started today, he may not have known about it. ViridaeTalk 02:44, 30 January 2008 (UTC)[reply]
The on-wiki list is still a work in progress.. as Beetstra comments above, it may prove to be technically impractical to use a wiki page as the blacklist due to the volume of RC traffic the bot deals with.. It is certainly possible to dump the list to wiki on a schedule so it can be reviewed by the community (this can be done by an independent program if necessary). Given the recent concerns about things that happen off-wiki, I prefer the list be out in the open and subject to community review. Until we get the dump written, the patterns which were matched can be seen in the revert edit comments in the bot's edit history.. There are very few domains in the database right now (compared to what was there when the old AntiSpamBot was running). --Versageek 03:45, 30 January 2008 (UTC)[reply]

Gracenotes: User:SquelchBot/Blacklist is overseen by the community. However, manual commands in IRC to override its default feature of not reverting more than once and not reverting auto-confirmed users is NOT something that can be overseen by the community. Since the "blacklist," which is really a "greylist," is going to contain links like Youtube and Blogspot, the existence of User:SquelchBot/Blacklist while allowing users to issue manual commands on an off-wiki IRC does NOT allow for community oversight, as User:DGG noted above.

And there should be no "off-wiki" list. Any blacklist the bot uses must be on-wiki.   Zenwhat (talk) 04:23, 30 January 2008 (UTC)[reply]

What happened here? Where was myspace.com defined as a bad link? this is the edit where it happened. Lawrence § t/e 06:48, 30 January 2008 (UTC)[reply]

I do know blog.myspace.com was meta BL'd Per request by Jimbo.--Hu12 (talk) 13:44, 30 January 2008 (UTC)[reply]
Re Lawrence Cohen. Yes, it reverts many links that do have a good use, but that are generally bad (that is, their additions are mainly cases that are in conflict with Wikipedia:SPAM, Wikipedia:COI, Wikipedia:EL, Wikipedia:NPOV, Wikipedia:NOT#REPOSITORY, Wikipedia:NOT#SOAPBOX, Wikipedia:COPYRIGHT etc.) but that also do have a good use (sometimes). That is why the bot warns these users with (in the case of myspace):
Your edit here was reverted by an automated bot that attempts to remove unwanted links and spam from Wikipedia. If you were trying to insert a good link, please accept my creator's apologies, but note that the external link you added or changed is on my list of links to remove and probably shouldn't be included in Wikipedia.
The external links I reverted were matching the following regex rule(s): rule: '\bmyspace\.com' (link(s): ....) . If the external link you inserted or changed was to a blog, forum, free web hosting service, or similar site, then please check the information on the external site thorougly. Note that such sites should probably not be linked to if they contain information that is in violation of the creators copyright (see Linking to copyrighted works), or they are not written by a recognised, reliable source. Linking to sites that you are involved with is also strongly discouraged (see conflict of interest).
It will not revert these editors a second time, that is reserved those moments where someone insists to add a really bad link, though probably a regular editor/administrator will come to that first. That means that if the editor reads the warning, and then realises that the link is appropriate the bot can be reverted. You are off course free to check all reverted blogspot additions by hand (which is the majority of the reverted edits by SquelchBot), you will see that the over 99% of these should indeed not have been added (per above mentioned guidelines and policies, and yes, other bots also make mistakes sometimes).
This bot only closes the floodgates a bit, and I have tried to write a friendly remark that the link that there may be concerns with the links the editor added. So no, the link is not bad (that is what we have the blacklist for), but there are too often concerns with them, of which we expect that regulars know that they should take care with.
Also, the first warning is a good-faith warning. As the bot will not revert that link again, there should be no next warnings, and the bot 'forgets' that it warned a user after a couple of hours, and the user should not get reverted after a couple of days (IPs have to be explicitly whitelisted). Only if a user insists in adding links which have a concern to several pages it will get to a point where administrators will be alerted (still the editor is allowed to edit, and will not be blocked, only when an administrator shares the concerns a preventive block may be applied.
And when a user insists in adding good links, which are caught by the revert list of the bot and hence reverted, to a number of pages, then that should, though the editor works in good faith and is not doing someting really wrong, still be of concern, we are writing an encyclopedia here, it is content that we are after, not external links only.
Re Zenwhat. All antivandalism bots that we have are working from an off-wiki revert list (and the VoABots do also revert imageshack and similar generally wrongly used external links). As Versageek says, we can make sure that the revert-list of SquelchBot is published regularly, and it can there be discussed if links should not be on that list, though I assure you that most of the links that were added by unexperienced editors which get reverted should not have been added, but if the percentage of good-link reversions gets too big (say more than 1-2%), then either we already remove it ourselves, or indeed it can be removed after remarks of editors (and put it on an attention list i.s.o. a revert-list). One of the powers of the bot is that it works directly, not after a blacklist discussion, and that is necessery for the blatant cases of spam.
I hope that we can assume some good faith on the people who have access to the bots, these people do have a long-time experience in fighting spam, and trying to keep the external links on Wikipedia in good shape.
I hope this takes away some of the concerns. --Dirk Beetstra T C 15:08, 30 January 2008 (UTC)[reply]
Note: I have changed the place of the 'blacklist', it is now User:SquelchBot/RevertList. It contains an almost up-to-date dump of the list. --Dirk Beetstra T C 17:32, 30 January 2008 (UTC)[reply]
  • I am strongly opposed to having bots make content decisions, which is basically what this amounts to. And, quite frankly, given Betacommand's history of brusque and abrasive interaction with other users, I don't want any more bots started up that he is associated with. The name is also unfriendly and unhelpful. I've seen too many cases in the past where a bot was approved with a relatively narrow mandate and that mandate was later unilaterally expanded by the operators to do something that didn't have consensus and where the bot wouldn't have been approved if it had been disclosed from the start. This should not be approved until and unless a reasonable cross-section of the whole Wikipedia community — not just the Bot Approval Group — says it is OK. *** Crotalus *** 17:50, 30 January 2008 (UTC)[reply]
Re Crotalus horridus: Bots don't make content decisions, that is done by the operators. That goes for this bot, that goes for the other antivandalism bots, what, it goes for all the bots. This bot does the same as what User:AntiSpamBot did for 44,000 edits (and yes, again, bots make mistakes, all bots do). It did what it described to do, revert external link additions which are generally not wanted or plain spam, and its working has merely softened over time (originally it was reverting all the time unless a user was whitelisted, now it only reverts IPs and new user accounts). Please do not try to extrapolate other bots onto this. Also, the bot is Versageeks responsibility, not Betacommand. Betacommand was elected into the BAG, and BAG does have community consensus of a reasonable cross-section of the whole Wikipedia community to make these decisions. --Dirk Beetstra T C 18:55, 30 January 2008 (UTC)[reply]
I am concerned by the apparent ease with which a site gets added to the blacklist, and the extreme difficulty in getting one removed. There is apparently no policy to handle this situation adequatly, and no place to discuss the issue, outside of OT conversations on related boards. I would like to propose a meta page to discuss how the blacklist is implemented and what it should and should not address. In particular imho a spam blacklist should address clear cases of spamming the project, not simply sites which fail on content grounds. That is, sites which are spamming us, with many links completely unrelated to any content they are falling into. Sites however, which have content issues, and are placing links into appropriate articles should not be being classified as spam, they should be being addressed on the appropriate policy talk pages. Until the community can discuss these cogent issues, I would be against any bot activation, and be in favor of deactivating any bots we have that are using the blacklist as their functional instruction set.Wjhonson (talk) 22:15, 31 January 2008 (UTC)[reply]
Please don't confuse this bot with the Mediawiki:Spam-blacklist, which prevents saving a page which contains a link to a blacklisted site. This bot is actually a "step-down" from that level of exclusion. It allows established editors to add links as they deem appropriate. Since the old AntiSpamBot stopped functioning reliably about two months ago, there have been a few instances where we've had to use the Mediawiki:Spam-blacklist to stop persistent inappropriate addition of links to specific sites by new and IP users, even though certain pages on those sites also had appropriate uses on Wikipedia. This has frustrated a lot of people. If this bot (or it's predecessor) had been running, we could have added those domains here - and established users would not have been inconvenienced. --Versageek 22:55, 31 January 2008 (UTC)[reply]

(OD) If admins have no compunction about using the list here to squelch sites, we're no further ahead than currently. I'm confused by your above. Are you stating that established editors can add sites, even though listed here or blacklisted, and somehow the Bot knows that a link has been added by an established editor and by-passes squelching it? If that is what you're saying, it isn't very clear. Are you stating that implementation of this bot would effectively eliminate the blacklist page entirely? That is, that blacklisting would no longer have any effect? Wjhonson (talk) 23:10, 31 January 2008 (UTC)[reply]

No, the two are serving a different purpose. The Blacklist prevents everybody from using the link, this bot sees who adds a link, and if that user is new, or a (not whitelisted) IP, then it reverts if that link is on its revertlist. So yes, the bot knows that the link has been added by an established editor. I hope this explains. --Dirk Beetstra T C 23:14, 31 January 2008 (UTC)[reply]
If you are prevented from *adding* a link at all by a blacklist entry, then what is left to revert in the first place? If a blacklist entry prevents addition, then there are no links to revert because they were prevented. It seems like you are saying this squelch list is an additional place where admins can freely ignore community input and do-as-the-please until cornered. Until we have safeguards in place, I'm not comfortable with creating yet another hidden powerbase.Wjhonson (talk) 23:18, 31 January 2008 (UTC)[reply]
We are merely re-creating. No, there are many links that can not be blacklisted so that they can not be used at all, there are parts of certain sites that are appropriate. Yet, many people who are not familiar with wikipedia use that loophole to add links on those same servers that are not appropriate. E.g. the largest majority of youtube is bad, the files are copyrighted, or totally unreliable. Still we can't blacklist it, because there are several files there that are appropriate. Reverting the additions of unestablished editors and making them reconsider, and telling them why they probably should not add the link closes that floodgate of links, which otherwise has to be removed by hand (per policy, e.g. Wikipedia:COPYRIGHT). Also, blacklisting is bad for spammers, being reverted and warned may make them think twice before they further spam wikipedia, and risk being placed on the blacklist. --Dirk Beetstra T C 23:29, 31 January 2008 (UTC)[reply]
What is the effective community consensual input to this process? Ask an Admin to my way-of-thinking is simply a no-starter. Our community is not based on the consensus of admins, but the consensus of the community. Without community involvement, this bot has the potential to simply become one more place where valid complaints by the community can be ignored without repercussions, because there is no policy controlling it.Wjhonson (talk) 23:47, 31 January 2008 (UTC)[reply]
A bot owner is responsible for the operation of a bot. Like every other editor on the project, the bot is obliged to follow community consensus. As bot operator, I am responsible to see that it does so. I welcome the input of the community with regard to the content of the bot's RevertList. The bot exists to serve the community. --Versageek 00:33, 1 February 2008 (UTC)[reply]
I'd like to see clarity that sites are only added to Squelchbot's blacklist by community consensus, and should not be added simply by a disaffected editor, no matter how highly they are placed in the hierarchy. That is, there should be a place to request an addition, and place to discuss such requests to gain consensus, along with an archive of past discussions. We should not create a new place where editors can block sites without additionally giving confirmatory evidence of the site's negative impact on the project. The evidence should be clear and permanent. I agree with the tauting of the ability to whitelist, but squelchbot also would serve to blacklist and that is my main concern. Blacklisting without appropriate community involvement, even for one minute, serves to cause dissension, not cohesion. Pre-emptive blacklisting based on a minority report would not serve the project's interest.Wjhonson (talk) 00:43, 1 February 2008 (UTC)[reply]
That defies the function of the bot. Spammers do not wait for consensus when they add their link, as do people who want to push their agenda by adding a certain external link (and yes, that sometimes includes good links. I saw that you are involved in such a case, where this bot would have been a less violent solution than the now meta-blacklisting of a link. Links are sometimes useful, yes, but we are not a linkfarm, and not an advertising service(striking this out, this situation seems to be controversial. --Dirk Beetstra T C 11:09, 1 February 2008 (UTC))). I am sorry, if they are reverted first by hand, and then still insist in performing their edits without wanting to discuss, then either the revertlist or the spam-blacklist (local or meta) is there. All policies and guidelines say that, if there are concerns about the link you have added (which is clear if the addition gets reverted and warnings are posted on a talkpage), then discuss it first on the talkpage. The community consensus is in these policies and guidelines. I know that that may result in good links being placed on a revert- or blacklist, but if there is no discussion from the persons pushing the link, and there is no appropriate use, then the consensus has already been reached (per the policies and guidelines of this encyclopedia). I would invite you to assume some good faith on the people who are active in a.o. Wikipedia talk:WPSPAM, blacklisting or putting it on the revertlist is not the first measure we take, we first try to discuss with the involved editor(s). And the links on the revertlist are not links that are widely used appropriate, there certainly are many concerns with them, and they generally simply should not be used (the good use being more an exception than a rule)!! I hope this explains. --Dirk Beetstra T C 10:27, 1 February 2008 (UTC)[reply]
  • Already I'm seeing problematic edits, even though the bot hasn't officially been approved yet. For instance, see [11], where a published book reference was removed at the same time as a Myspace link. To top it off, the Myspace link might actually be acceptable in this particular circumstance, since it's supposed to be the profile of the same musician whose page this is. (However, the link was a 404 when I clicked on it.) This is not encouraging. *** Crotalus *** 16:23, 1 February 2008 (UTC)[reply]
  • Here's another example where good sources were removed along with questionable ones. This kind of thing is precisely why I don't want any automated tools trying to perform these functions. Only humans can accurately make determinations as to what belongs in an article and what doesn't. *** Crotalus *** 16:27, 1 February 2008 (UTC)[reply]
I got "Invalid Friend ID. This user has either cancelled their membership, or their account has been deleted." on the myspace example--Hu12 (talk) 16:30, 1 February 2008 (UTC)[reply]
Yes, I know. The problem is that a valid book reference (complete with citation template and page numbers) was removed at the same time. This is why I don't want bots doing work that needs to be done by humans. *** Crotalus *** 16:32, 1 February 2008 (UTC)[reply]
Actually Dirk, if you could find the policy that controls the blacklist explicitely I'd like to see it. That is part of the entire problem, in my humble view. So far I've found lots of people talking about spam, spammers, and the blacklist, but very little in the way of defining what is spam, who is a spammer, and how the blacklist should and should not be used. I disagree that merely putting a link on the blacklist represents consensus. Quite a few links got onto the blacklist really without any community input at all. Hopefully opening this discussion to the wider community can address some of that discrepancy. All content issues of any sort whatsoever should be going to our policy boards at Wikipedia:V, Wikipedia:NPOV, Wikipedia:BLP, etc. However in some cases content issues are going directly to the blacklist. Until this problematic situation is ironed out, I can't support any bots making the appearance that we have procedural consensus. Wjhonson (talk) 16:38, 1 February 2008 (UTC)[reply]
What is spam is easy, see Wikipedia:EL and Wikipedia:SPAM. External links that are in violation of these guidelines (generally, where users push an agenda, where the links are added to improve traffic to a site, or where links are added to material which is in violation of copyright), then these links should not be there. Moreover, wikipedia is not a linkfarm. We are writing an encyclopedia here! That is the procedural consensus there. Read Wikipedia:COI, Wikipedia:COPYRIGHT, Wikipedia:NOT#REPOSITORY, Wikipedia:SOAPBOX, Wikipedia:DIRECTORY (all (parts of) policy). The links on the revertlist, or on the blacklist are generally in violation of one of these. If the violation is total, it gets (meta-)-blacklisted, if the majority of the sites is in violation, but not everything on the domain, then it gets reverted by this bot, after which the editor, or another, uninvolved editor, can have a second thought about it (e.g. revert the bot while leaving out the questionable link, discuss on talkpage first, etc.). Yes, it sometimes reverts too much, but after the first (good-faith) warning, the editor can repair that part of its edits (the VoABots also revert all edits by an editor the bot thinks is vandalising a page). Feel free to join us on IRC, and see for yourself that there are about 20 external links added per minute, I don't dare to guess which part is questionable there, what I do know is that there is only a small percentage of the reverted links actually appropriate. But if you are going to shoot this bot for that reason, then I ask you to apply the same to all the other active bots. --Dirk Beetstra T C 17:19, 1 February 2008 (UTC)[reply]
I indeed (hope I) repaired the referencing, template and remark problem, that's why we are in test-phase, I guess. --Dirk Beetstra T C 17:20, 1 February 2008 (UTC)[reply]
You're addressing whether there was consensus to create a blacklist and populate it with something. I'm not addressing that. I'm addressing the issue of how a link gets added to the blacklist and how it gets removed. That process, which has great potential for disruption is not, in my humble opinion, well worked out and implemented. Would you have a problem in having a system where proposed links are validated by community consensus? I'm not sure why you're having such an issue with what to me is our standard operating procedure. You're starting from a corrupted database and asking for approval here. Instead I suggest you start from a blank database, no links at all, and get community consensus for each additional one. Then at some point, we can simply archive the old blacklist process and use your bot instead. If you plan to start from the blacklist, I can't support that. The blacklist entries not the blacklist concept, never achieved community consensus. Twenty links per minute simply cannot be verified. What is the proof that these aren't being added maliciously? Since the proofs are not stored, they are not permanent, and they are not being shared with the community, IRC completely circumvents the open policy we have here. For each link there should be a link to a discussion where the evidence of wrong-doing by that URL is shown. I'm never going to support the secret government approach "We're just doing it for your own good! No really! Go away now!"Wjhonson (talk) 17:28, 1 February 2008 (UTC)[reply]
Wjhonson, there is no CABAL. please get that through your head. IRC is open, the #wikipedia-en-spam channel is logged. if there is any day that you would like to see Ill be glad to send you those logs or post them. Misza13 runs logging tool, so there is no percevied cabal. Due to the sheer size of these logs posting them to the wiki would be difficult. As for requesting a link discussion see Wikipedia:WPSPAM Wikipedia talk:SBL or the channel. if you have questions about a particual link feel free to ask a bot operator, or join the IRC channel, or leave a note on WP:WPSPAM. this is one of the more open methods. we dont need more paper pushing and sources of drama. βcommand 17:44, 1 February 2008 (UTC)[reply]
I think the idea of a blank database to start, with only community-approved, on-wiki approved URLs being added is a perfect idea and would address many of my own concerns, especially if the bot was limited to 1rr for any one URL on any one article. And not 1rr per day--I mean period, so that the bot wouldn't come back each day for it's 1rr fix. Lawrence § t/e 17:38, 1 February 2008 (UTC)[reply]
Lawrence it only reverts once per link addition. not once per day. once the bot reverts it will not revert to its own edit. intead another editor will have to edit the page and then add add the link again for it to revert. βcommand 17:48, 1 February 2008 (UTC)[reply]
OK (sorry, this page exploded since I last took a good look). So if I add a link today, bot reverts once, I re-add, it stays. The bot won't come back tomorrow and take it out again? Lawrence § t/e 17:52, 1 February 2008 (UTC)[reply]
it will make a note in IRC about the fact that you added the link again but it WILL NOT REVERT. that goes along with if a second editor reverts the bot. βcommand 17:55, 1 February 2008 (UTC)[reply]

Arbitrary section break

edit
OK, that is fair. The process was described above. For the revertlist: If the operators see 'bad' links being added by an account, then we have a look, revert, and warn the user. If the user decides to ignore those warnings, or changes IP or whatever, then we add the links to the revertlist, and leave reverting and warning to the bot. If a user insists too much, in the end he will get reported to Wikipedia:AIV. There another admin will evaluate what the user was doing, decide if a block of the account is in place to persuade the editor to discuss more, or that it is better to evaluate the edits and perform them properly (the admin's account will be older than 7 days, so the bot will not revert; and probably the admin is already on the whitelist of the bot, in which case it will never be reverted). Since spamming is something that is actively happening, those links get added without discussion first, if you see how many edits have to be checked, it is impossible to handle those cases by hand (we don't have enough manpower to handle that).
The same for links where the general majority is rubbish, it is better to remove that, and when the link is actually

good, then the editor can add it again (though it will generate an alert and probably someone will have a look). Still, the link can be added, e.g. by 'established' editors.

Sorry, I don't think that consensus first for the revert-list is a good plan, I'd think that we should abide by abovementioned policies and guidelines, and if there is too much error with a rule, then indeed it has to be removed and handled by hand.
Removal can be discussed on the talkpage of the bot, I am sure the operators will remove the rule if it can be shown that there are a significant number of good links being removed (relative to the total being reverted under that rule).
For the blacklisting (which is a more rigorous approach to blocking links), generally first a request has to be made, which is evaluated by other editors (sometimes persistent/true spam is added without discussion, but that is after the original additions have already resulted in blocks on users, or the edits are performed by a widespread IP-range which repeat additions on the same page several times; that should suggest to them that the links are not wanted).
For XLinkBot, another thing is that editors do get warned pretty soon that there are/may be concerns with the links they add (I'd like the warning to be there within 30 seconds, don't know if I can technically reach that). That is much better than first have the edits there for some time, and when the questionable edits are finally found by an editor and get reverted, and the spammer gets warned, the 'spammer' is already asleep or has changed IP, resulting in the often not getting the warning (or the effects of a block).
Hope this explains a bit more. --Dirk Beetstra T C 17:59, 1 February 2008 (UTC)[reply]

"Sorry, I don't think that consensus first for the revert-list is a good plan"

then on the whole bot. Unless the Community has absolute control at all times on this for what counts as a bad link or URL I don't think it's a good idea. Lawrence § t/e 18:04, 1 February 2008 (UTC)[reply]


Re the 1RR. If the bot reverts a link-addition, and someone decides to add the link again, then it will not revert (that is for the same user, or for another user, even if both are 'new'!). If a 'new' editor then does a next edit in which the link gets changed, then that edit may result in a second revert (but the link has to change, even if it is only one character, if the link does not change, then it is not observed as an added link). The same goes if an editor uses one of the official reverts (the undo button, e.g.). Those edits will NOT be reverted.
Both do have an override, but that gets only used in rare cases. If an active spammer insists too much, then his specific link may get onto that list. Because of risks that will never happen with broad rules (e.g. reverting blogspot.com), because that is of effect on all blogspot links. But we would then apply it to e.g. 'johndoe.blogspot.com' (which is most probably only added by that one pushing editor). That list gets cleaned when there are no occasions of additions of that link anymore (links for which override is necessery will probably be reported to a spam-blacklist somewhere, as it is probably complete rubbish). --Dirk Beetstra T C 18:13, 1 February 2008 (UTC)[reply]
Re the , then I invite you to help us by hand in the spam-channels. What you suggest makes the whole bot superfluous. You obvious don't see how much spam gets added to this wikipedia, and that there is currently no way to keep that in hand. I am sorry. --Dirk Beetstra T C 18:13, 1 February 2008 (UTC)[reply]
Do I also personally need community consensus if I revert, by hand, the majority of blogspot.com/myspace/etc. links that get added? I think I make the same editorial decision as when a bot does it, and I am sure I will also make some mistakes in doing that, and I will probably also revert more established editors.
Just as a reference: diff. --Dirk Beetstra T C 18:24, 1 February 2008 (UTC)[reply]
  • Well, I think that you should have community consensus to do mass link removal. Mass removal of anything on Wikipedia is disruptive (even if well-intentioned), and should at least discussed somewhere. --Iamunknown 20:44, 2 February 2008 (UTC)[reply]
  • The comments above state that the bot is not supposed to revert repeatedly. However, this doesn't correspond with what is currently happening. Here is a case where the bot reverted five times in one day (in blatant violation of Wikipedia:3RR). The Myspace page that was being removed is ostensibly the band's official page, which, if this is true, means it's probably an appropriate external link. What bothers me the most about this is that human editors were jumping on the bandwagon, assuming that of course the bot must be right and the IP editor must be wrong. I think this is a serious violation of Wikipedia:AGF and Wikipedia:BITE on the part of those reverters. *** Crotalus *** 05:09, 2 February 2008 (UTC)[reply]
I have tweaked the bot (now testing, which is a matter of waiting for a similar situation), it will not revert if it has reverted more than 2 times in the last 30 hours (that is a pretty strict 3RR), as there are some problems in detecting that, it will also not revert if there are more than 50 edits to a page in the last 30 hours, and in those 50 edits there is one XLinkBot revert. It may on a super-heavy edited page that there are more than 3 reverts by XLinkBot .. but that would approximately mean 4 reverts within 30 hours where the page in question has about 150 edits in that same time. Note that it are 3 reverts, not 3 reverts on the same editor or 3 identical reverts. Please drop a note if it still does it wrong, hope this helps. --Dirk Beetstra T C 16:22, 2 February 2008 (UTC)[reply]

I'm satisfied that the plans for the bot's reverts are sane, that the operator is responsive to needed changes etc. This is an area the enwikipedia needs some assistance - there simply isn't enough manpower available to do this task manually. – Mike.lifeguard | @en.wb 20:27, 2 February 2008 (UTC)[reply]

This bot will essentially be in-control of a very large important section of the system. Leaving it under automatic control, by any admin to block, at any time, any site they care to, and then claim the oops defense as we all know will occur, is a bad idea. We've already seen examples of links added to the blacklist based on content-wars. All content issues should be taken to the relevant boards. The blacklist is not the place to solve content-warring between involved editors and admins. IRC is *not* an open public proxy. It simply isn't, it never has been. It's a secret, closed, system, with no history, no policy, no linking. There is no way for me, to go back into IRC history to review *why* a link was added. So if I protest I get, oh it was added in IRC, go away and let us work. This is not satisfactory. Never was, and never will be. That is not the way the rest of the project works.Wjhonson (talk) 20:34, 2 February 2008 (UTC)[reply]
Lawrence Cohen above is exactly correct. All blacklist additions must be on-Wiki, community-approved additions. With history and examples of why the link was added.Wjhonson (talk) 20:37, 2 February 2008 (UTC)[reply]
Wjhonson, IRC is open, and upon request I will send any days log to who ever wants it. and if you want I could set it up that you get it sent to you daily. due to the sheer size of the logs posting them are difficult. but you can review them if you want. Wjhonson remember we are not a bureaucracy the way that the previous bot was handled had no problems. so get over the fact that there is no cabal and that IRC is open join irc://freenode/%23wikipedia-en-spam and see for your self. otherwise quit bringing up strawmen βcommand 20:43, 2 February 2008 (UTC)[reply]
Okay beta command, please send the day's log from every day back to the beginning of Wikipedia. I will then post each one of them in-Wiki for all people to review, which is what should have been going on from day one. BetaCommand you know perfectly well that the vast majority of readers and I dare say the vast majority of editors would have no clue whatsoever how to access IRC. So effectively, a large part of the control of the system is shunted into a secret corner where only the elite ever view it, or would ever even know how to view it. That's not something to be overlooked. No one group should ever have that kind of control over the system.Wjhonson (talk) 20:50, 2 February 2008 (UTC)[reply]
Wjhonson, I am not going to do that as you should not be adding that much data to wikipedia. its between 600Kb and 2MB of data per day. there is just no point in uploading that much data that is not needed. βcommand 21:03, 2 February 2008 (UTC)[reply]
Regarding IRC being opaque: Please note that the vast vast majority of people don't know how to edit Wikipedia either - it's a total black box. Furthermore, the vast vast majority of people don't even know how to access the internet. The vast majority of people don't know what the internet is. Most people on the planet have never made a phone call. Please don't take us down this road. – Mike.lifeguard | @en.wb 20:55, 2 February 2008 (UTC)[reply]

(OD) That's a null argument Mike. I have always been speaking solely of the readers and editors to our project. It's not only the articles that should be open, but all process from A to Z. All of it should be open. Not just at this moment, but all of history as well. At the moment, there is no way for any person to find out why www.blahblahblah.com was added to the blacklist. None. Zero. That's not an open, consensus-seeking society. It's simply not the way we operate in any other facet of our existence. The idea that there are 2M of blacklist discussion per day is mind-boggling to those of us who believe our project should be open-source.Wjhonson (talk) 00:29, 3 February 2008 (UTC)[reply]

Wjhonson, that log includes all output from the link watchers also, 99.9% of that data is on wiki. the irc logs that I have contain ALL information in the channel. that includes the recent changes feed about links. if you want to understand the workings of the anti-spam users you should see what they see. not only what they say. Please walk in the shoes of the anti-spam users. instead of complaining look for your self. βcommand 00:37, 3 February 2008 (UTC)[reply]
As I've stated, I'm referring only to additions and deletions from the blacklist. Period. Nothing more. Why a link was added, why it was deleted, including the permanant evidence. All of that data should be on-Wiki, all-the-time, with history. I am not, and never was referring to, anything else. Without the ability for any editor to review that information, we are asking for a level of trust that is simply too high.Wjhonson (talk) 00:40, 3 February 2008 (UTC)[reply]
That data can be found in the RC logs. βcommand 00:42, 3 February 2008 (UTC)[reply]
I see the logs I was just denied any ability to access?Wjhonson (talk) 00:44, 3 February 2008 (UTC)[reply]
Wjhonson, if it would have been a reasonable request and not one every log, I would have sent them but you dont need to be uploading 500Mb of data to wikipedia. βcommand 11:50, 3 February 2008 (UTC)[reply]
The point is IRC logs are not acceptable for WP decision making/ Everything that does not have real reasons for confidentiality MUST be on-wiki. There should be no WP process requiring IRC to participate or to observe. DGG (talk) 09:29, 3 February 2008 (UTC)[reply]
DGG, if there are questions about a link anyone is free to ask or question a rule, and then we can explain it. As with any issue there is always more to it than what it seems, so please dont assume too much. the evidence that you want does exist on wiki, its just not as obvious. there are filters that report them in IRC about specific users/links that are tracked over time. βcommand 11:50, 3 February 2008 (UTC)[reply]
I content this statement that the evidence exists on wiki. If it exists provide a link to it. As long as this can occur, I will never support the use of SquelchBot. The link was added merely on the whim of an involved admin, with no consensus, no input from the community, no anything at all. This admin has been aggressive seeking to ban this link based on a content issue which has nothing to do with spam at all. Dirk did it. It was completely inappropriate. This sort of no-input-addition has to stop.Wjhonson (talk) 01:12, 4 February 2008 (UTC)[reply]
You will see in that link, some statement is presented by the requestor. There is, in fact, no evidence that the statement is accurate. No one adding the link made any attempt to verify that the statement there presented was accurate or based on any evidence whatsoever. And yet additions of this source can have a chilling effect on the community. Wjhonson (talk) 01:15, 4 February 2008 (UTC)[reply]
Wjhonson, again you fail to understand per ANI Fine. It's not technically a spam site, so it does not technically belong on the spam blacklist. However, it is unacceptable as a source or an external link, as described to you many times. Therefore, there is no pressing need to take it off the blacklist, is there? ... Thatcher 13:18, 1 February 2008 (UTC) just because a user does not participate in a conversation does not mean that they dont follow it. so please assume good faith and not attack good standing users without any shred of proof. βcommand 01:23, 4 February 2008 (UTC)[reply]
Your argument is moot. The spamlist is for spam, not for any unreliable site. The pressing need to remove it is process and consensus which are here being flauted in such an extreme manner. When a particular admin goes on a mission to destroy my reputation that I do take particular issue with it. Many editors are in agreement that process and consensus were violated in this case. Since the community is not in consensus here, what is the standard for removal? Consensus is the standard. Do you deny that consensus should hold? The community is in consensus that the spamlist is for spam, not for content-related wars.Wjhonson (talk) 01:30, 4 February 2008 (UTC)[reply]
Wrong, Jimbo himself had blog.myspace.com black listed on meta due to reliability issues. that is a content decision. so please learn your facts. βcommand 01:33, 4 February 2008 (UTC)[reply]
And there was long discussion and consensus about that as you know. This site was added, due to a specific content-war in which the requesting admin was the creator of the article. I'm sure you can see how that is inappropriate. The community has dozens if not hundreds of times pointed out that admins involved in content-wars should not be aggressively trying to tool-silence critics. I'm still waiting for that link to where all the evidence is stored for why certain sites are blacklisted which you say is on-wiki. As you can see this particular link was added with no evidence, merely based on one request with no evidence.Wjhonson (talk) 01:53, 4 February 2008 (UTC)[reply]
  • Wrong as stated above. Wjhonson, its fairly evident from you posts that your attempting to advance "your" agenda by using this discussion as a mouthpiece for that adjenda and purposefully derail and disrupt this approval process.--Hu12 (talk) 03:17, 4 February 2008 (UTC)[reply]
Wrong, as stated above. Hu12 its fairly evident from your posts that you, created an article, which was then criticized off-wiki. There was an ongoing content-dispute over that, including the potential for ArbCom intervention. That ArbCom was only put on-hold because the Kingofmann decided to leave Wikipedia, but is now back. You while this was going on, tried to do an end-run around policy and consensus to use the blacklist to win a content-war, claiming without any evidence that it was being used to attack Wikipedia editors other than the BLP article. Other uninvolved editors, who have reviewed the situation has come to the same conclusion. Instead of merely admitting that you did this, you continue to reposition the argument, avoiding any inference that what you did was inappropriate. Which is why Wikipedia needs people, fighting for free speech, to counter-act those who want to smother the other side in conflict by process-tools. That is not what the blacklist is for.Wjhonson (talk) 03:23, 4 February 2008 (UTC)[reply]
So your Not using this discussion as a mouthpiece for that adjenda?--Hu12 (talk) 03:32, 4 February 2008 (UTC)[reply]
It was an appropriate example. It illustrates the core problem quite nicely. That a person can get a site black-listed, with no evidence, and then turn around and use that very blacklisting to try to win an ongoing content dispute. That's an inappropriate abuse of the blacklisting system.Wjhonson (talk) 03:37, 4 February 2008 (UTC)[reply]
Both inaproriate and a clear mischaracterization of the facts. Please stop using this discussion as a mouthpiece for you adjenda[13] by attempting disrupt and derail this approval process.--Hu12 (talk) 03:50, 4 February 2008 (UTC)[reply]
Wikipedia:KETTLE, if anyone is being disruptive re this process, it's you. Please adhere to project goals, your attitude attempting to subvert consensus editing is anti-project. Thanks.Wjhonson (talk) 03:52, 4 February 2008 (UTC)[reply]
everyone, the point is that this can apply to any article that is disputed. That's why it's good to have things visible to everybody on wiki. DGG (talk) 03:54, 4 February 2008 (UTC)[reply]
and the same thing could happen with one of the MW blacklists too. βcommand 03:55, 4 February 2008 (UTC)[reply]
Which is why Bc the entire process, from A to Z, from start to finish, from top to bottom should be open, at any time, without permission granted, to anyone who wishes to review any part of it. That is, in my own humble opinion, part of the entire ethical basis for the project. That is why we don't have secret email lists and secret closed-door meetings. Any aspect of the project, anywhere, where a solid clear and compelling reason cannot be given for why information is being withheld, should not be withheld. It should be opened up for independent review. The basis of a free society is that anyone can review the actions of the government through public records. We are supposed to be aiming to be even better then any existing democracy, in that we are a discursive democracy. Anything that even seems to be hidden works against the projects ethical basis. Wjhonson (talk) 04:05, 4 February 2008 (UTC)[reply]
Its fairly evident that this is not isolated. This appears to be just a variation in a long term pattern by you, using wikipedia as a platform for promotion and advancing "your" adjendas. please see Wikipedia talk:WikiProject_Spam#http:.2F.2F.countyhistorian.com--Hu12 (talk) 06:10, 4 February 2008 (UTC)[reply]
Your argument there is without merit, as anyone who reviews the evidence can plainly see for themselves. The few references to my site have in each case, been upheld by community consensus. You can continue to try to win by discrediting the other side, but it's simply not going to fly.Wjhonson (talk) 06:21, 4 February 2008 (UTC)[reply]
If you feel this is about "winning" something, I recommend that you honestly re-examine your motivations. Are you here to contribute and make the project good? Or is your goal really to find fault, get your views across? Perhaps secretly inside you enjoy the thrill of a little confrontation, but to everyone who is busily trying to work together harmoniously to build an encyclopedia, that becomes an impediment.--Hu12 (talk) 06:39, 4 February 2008 (UTC)[reply]
No I don't feel this is about winning. Do you? I recommend you review your own motivations. It seems like you've used process to overrule consensus. We are not a bureaucracy as you know. Rules do not supercede what the community wants. The community defines and interprets rules, and they can change as new situations arise. I don't see a lot of activity by you on our policy pages, perhaps you're not too conversant with them yet. I'd recommend you review them a little more. I don't find your approach to this issue to be harmonious.Wjhonson (talk) 06:44, 4 February 2008 (UTC)[reply]
Seems you feel you speak for what the community wants, perhaps you see yourself bearing the TruthTM also?--Hu12 (talk) 07:15, 4 February 2008 (UTC)[reply]

Big BAG Section

edit

On a technical merit I'm approving the bot. From a "don't break the site, won't overload and serves a useful purpose (spam sucks) and it will take care of it in a pretty graceful manner. Yes, the task may be a little controversial, but hey, people hated Tawkerbot2 when it first started operating and nowadays people don't give anti-vandal bots a second glance - they take them for granted (and complain when they're offline). So, I'm not touching the politics, but on technical grounds, it has the green light. -- Tawker (talk) 05:28, 4 February 2008 (UTC)[reply]

Tawker listed this Bot at Wikipedia:RFBOT/A. I am afraid I am unwilling to assign a Bot flag to this account at this time given that Tawker's approval above is clearly equivocal. The Bot Approvals Group is presently responsible for determining both the technical suitability of a proposed Bot and the consensus that it is compliant with policy. It seems to me that Tawker is only willing to sign off on the former of these. Unless the community (or BAG) intends bureaucrats to take a more active role in the Bot approval process, I cannot see that this is a valid approval. Agreement needs to be reached on the desirability of this Bot (not just that its code is sufficient to perfom the proposed task without adverse server consequences). The "politics" need to be sorted as well as the technicalities before crats are asked to flag Bots in my opinion... WjBscribe 22:40, 4 February 2008 (UTC)[reply]
This bot's edits should always appear in recent changes.. so at least for the time being a bot flag is unnecessary.. If we move to using the 'rollbacker' style of rollback, we may need it for technical reasons, but we'd manually tag the edits so they appeared in the recent changes anyway (as ClueBot does).--Versageek 22:58, 4 February 2008 (UTC)[reply]

Moving forward

edit

I think it is time to move forward. I have in the last couple of days performed some adaptations to the bot-code (some parts still in testing mode) to address some of the concerns.

It is possible to work with an on-wiki revertlist, though I do not know as yet what impact that will have on the performance of the involved bots (especially the linkwatchers, the en-specific linkwatcher swallows at the moment about 160 edits per minute, of which 116 in the watched namespaces (main, template, category, i.e. 116 diffs to parse per minute, which is the difficult work for the bot); 31 added links per minute; in 4.8% of the edits, 5.6 edit per minute concerns some form of link-addition, i.e. true addition, or changing a link). (data for 51 minutes activity of the linkwatcher, i.e. link-statistics, not revert-statistics)

The problem now is, a lot of links are OK (already 4.8% of the added links match rules on the current whitelist, as are 0.7% of the users), but we know that there are also a lot of links added which are questionable, especially by new/unestablished editors (1.2% of all added links match rules on the revertlist (where the above statistics are based upon), this includes the links added by established/whitelisted editors, which don't get reverted). There are links amongst those questionable domains (e.g. blogspot, myspace, wordpress, youtube) which are appropriate. There is a choice there, either all these edits are checked by hand (an impossible task), or we accept a 'reasonable' rate of errors by the bot (the bot reverts, and warns the involved user that there may be/are concerns with the added link (the first time with a good-faith warning, which does get stronger after persistent addition); the edit can always be reverted, the bot does NOT block a link-addition!).

I am going to give an opening shot for the error rate: an average of 1 revert on an appropriate link in 200 reverts on a certain rule.

I will try the on-wiki revertlist (mainly to see how the bots react), where links can be added and removed on demand by administrators. Seen that some spam has to be dealt with directly, things may go on there without previous discussion (without discussion, not necesserily without proof; to catch active blatant spam or those links which are too often in violation of policy/guidelines, according to the suggested 1:200 error-rate). If the load on the systems is going to be too much I will have to switch back to the off-wiki RevertList. In that case the RevertList will be published regularly, and then we need a review process of rules which are questionable (i.e. which result in too many good links being reverted). How does this sound? --Dirk Beetstra T C 17:20, 4 February 2008 (UTC)[reply]

Beetstra, a thought how about a hybrid system, have the current offwiki system, while having the bot grab the list from its own user.js subpage say every 30 minutes. ? βcommand 17:36, 4 February 2008 (UTC)[reply]
Could be worth considering. I'll have a look into that. --Dirk Beetstra T C 18:26, 4 February 2008 (UTC)[reply]

I am testing the last versions now with on-wiki revertlist (here: User:XLinkBot/RevertList, and now also with on-wiki settings (like VoABot II, here: User:XLinkBot/Settings. Only editable by admins, and both subject to change, depending on how the testing goes. --Dirk Beetstra T C 20:00, 4 February 2008 (UTC)[reply]

I doubt we'll be able to move forward until the specific issues brought forth, are addressed clearly and directly. I haven't seen any serious attempt to do that yet. Wjhonson (talk) 16:41, 8 February 2008 (UTC)[reply]


The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.