Wikipedia talk:Bots/Archive 11
From Wikipedia, the free encyclopedia
Contents |
Interwiki by RobotJcb
I would like to ask permission to do some interwiki botworks with my bot: RobotJcb. I run a multi-login bot, doing interwikis on several wikis. Today I allready did a few edits on EN.wikipedia with the robot: [1]. I did not now that permission was needed, I'm sorry. But you may review that 18 edits to see what kind of upkeep I would like to do. Jcbos 15:27, 8 October 2005 (UTC)
- Is this a pywikipedia bot? If so, then I am ok with you running it, as long as you only make links to articles in languages you can understand. (This is based on the existing policy, which we are discussion, so this may change shortly.) Please let us know which languages you can understand well enough to judge whether or not two articles are on the same subject. -- Beland 03:09, 9 October 2005 (UTC)
-
- This is a pywikipedia bot: interwiki.py. It runs on about 20 languages, of which I can understand about 10. But I use the autonomous option, so if there are any doubts, it will nog change anything and just skip the page. Jcbos 11:49, 9 October 2005 (UTC)
-
-
- i'd like to know if the en: community is approving of this bot or no. btw this bot does good work on nl:, and imho he deserves a little bit more feedback than from just 1 user responding to his request... oscar 00:33, 17 October 2005 (UTC)
-
-
-
-
-
- OK, I will keep that in mind and I will not remove broken interwiki links. Jcbos 23:22, 17 October 2005 (UTC)
-
-
-
-
-
-
-
-
- In a few pages I already removed an interwiki yesterday, but I just restored them manualy. Jcbos 23:31, 17 October 2005 (UTC)
-
-
-
-
-
-
-
-
-
-
-
- It has been fixed and as far as I can see it didn't happen anymore. I will check my botedits from time to time as well. Jcbos 17:23, 26 October 2005 (UTC)
-
-
-
-
-
-
Interwiki bot policy
The existing policy on the project page is a bit vague on what interwiki bots are "known safe", etc., and I have posed some questions above that affect a lot of interwiki bots, including the two most recent requests for permission.
I propose the following slightly revised policy on Interwiki bots. I have posted an RFC pointing here. -- Beland 03:32, 9 October 2005 (UTC)
pywikipedia interwiki bot operators making links from the English Wikipedia:
- Must run the latest version.
- Must update on a daily basis.
- Must speak English well enough to respond to complaints about incorrect links or other bot behavior.
- Must actively monitor their English talk page, or have a note directing English speakers to meta or another page which is actively monitored and where they are welcome to post in English.
- Must ask for permission on Wikipedia talk:Bots (primarily to allow people to voice any objections based on the history and reputation of the operator).
- May conduct a short demonstration run immediately after requesting permission, so that participants can examine bot edits.
- Need not speak the language of Wikipedias they are linking to, but this is encouraged. Obviously they will not be able to provide manual disambiguation for links to languages they do not understand.
- Are encouraged to indicate on their English user pages which languages they speak.
- Need not manually check all links made by their bot.
Non-pywikipedia interwiki bot operators:
- Must ask for permission before doing a demonstration run.
- Will need to have all links manually checked, at least during a test phase. This means they must speak the language of the target language(s), or find a volunteer who does.
- Must speak English well enough to respond to complaints about incorrect links or other bot behavior.
- Must actively monitor their English talk page, or have a note directing English speakers to meta or another page which is actively monitored and where they are welcome to post in English.
- Are encouraged to indicate on their English user pages which languages they speak.
Other editors who see semantically inappropriate interwiki links being made by bots are encouraged to fix those links manually. In general, our feeling is that automatic links which are mostly right are better than no links at all. If one or more bots are making a high percentage of erroneous links, editors are encouraged to leave a note on Wikipedia talk:Bots and the talk page of the bot or bot operator. General feedback on interwiki bot performance is also welcome.
Comments/support/objections
- Part of the reason I'm confortable with bots making links to articles in languages the operators don't understand is that these links are being made in the English Wikipedia. Anyone who knows enough to complain about any such links should be able to do so in English. As long as the bot operator also speaks English, that should be fine. If the bot operator doesn't speak the target language, then it should be fine to trust the judgment of a human who claims to know the right answer over a bot which could easily have made a bad assumption. I'm comfortable letting pywikipedia bots make the assumptions they do because there seem to be a lot of them running, and there does not seem to be any complaints that they are habitually inaccurate. -- Beland 03:36, 9 October 2005 (UTC)
- What about the removal of (not the modification of) interwiki links? They should also be listed under the Interwiki bot section on Wikipedia:Bots. --AllyUnion (talk) 04:06, 9 October 2005 (UTC)
- pywikipedia doesn't remove links without manual prompting, does it? Certainly no one who doesn't speak the target language (not just the source language, which in our case would be English) should remove an existing interwiki link. (Unless we have a policy encouraging removal of links to non-existing foreign-language articles?) Is that what you are getting at? -- Beland 03:06, 20 October 2005 (UTC)
Kakashi Bot
Kakashi Bot will be used for two purposes: One time requests & Marking short articles (less than 1K) as stubs. Anything that is less than 42 bytes will be marked with {{db}} and anything less than 14 bytes will be auto-deleted under my account. This is per the discussion held at Wikipedia:Village pump (proposals)#Auto deletion of nonsense. --AllyUnion (talk) 04:10, 9 October 2005 (UTC)
- I included the discussion below, before it gets deleted. -- Beland 02:35, 20 October 2005 (UTC)
Auto deletion of nonsense
Repast The Tempest (Insane Clown Posse) Formal amendment Hay sweep Actual effects of invading Iraq W. Ralph Basham Adam bishop Harrowlfptwé Brancacci Chapel Acacia ant(Pseudomyrmex ferruginea) Cyberpedia Principles of Mathematics Moss-troopers Gunung Lambak
All of the articles above are created by people as a "test". On average there are 2-20 of these posts that keeps admins busy unnecesarily. All pages have one thing in common they are less than 16 bytes. 15 bytes is a magical number because it is the smallest article size posible to have a redirect. or #redirect A = 15 chars. --Cool Cat Talk 02:30, 21 September 2005 (UTC)
We already have a way to detect such pages. That'll be my bot in #en.wikipedia.vandalism . It is trivial to add it a function to make it delete any page newly created with less than 15 bytes. I intend to do so, objections? --Cool Cat Talk 02:30, 21 September 2005 (UTC)
- Several people suggested why not simply disallow the creation of pages smaller than 15 bytes, well people would create a slightly larger page with nonsense. People should be able to experiment and if it is detectable its better. This way admins can worry about real problems rather than waisting hours on peoples "tests" --Cool Cat Talk 03:03, 21 September 2005 (UTC)
- I intend on restrict the bot to article namespace --Cool Cat Talk 03:03, 21 September 2005 (UTC)
- I was the one that originally started bugging people about this idea. The bot is very reliable at detecting too small to be #redirect articles, and I thought it would be good to automatically delete them to save the admins time. Later, if this works, I think it would also be cool to test out the possibility of auto reverting page blanking, where if more than 90% of the page is blanked without an excuse, it automatically reverts its. One thing at a time though, I hope to see this implemented soon --appleboy 03:17, 21 September 2005 (UTC)
- I support the bot addition. Not in a single case I've seen one of those tiny entries has been a valid one. -- (☺drini♫|☎) 03:16, 21 September 2005 (UTC)
It might be a good thing to avoid deleting anything that's a template. One example is a {{deletedpage}} (which happens to be 15 chars, but no reason it couldn't have been shorter). Or, it might theoretically be possible to have a very short template that performs some logic based on the {{PAGENAME}}. -- Curps 09:01, 21 September 2005 (UTC)
- The bot would not delete admins creating tiny pages. Only admins can have a valid reason (that I can see) to create a tiny page, I think we can trust the admins for this. --Cool Cat Talk 09:40, 21 September 2005 (UTC)
- Admins shouldn't, as I understand things, have editing privilages differnt from thsoe of ordinary users in good standing. a page that uses templates could have an arbitrarily long expansion for a very short source text. either the bot should ignore any page using transclusion, or it should count on the lenght of the text after transclusion, whichever is easiest to implemt, IMO. Recognizing the presence of a template is easy, after all. DES (talk) 14:51, 21 September 2005 (UTC)
Pages created in the past 10 minutes: Jamie Lidell Katy Lennon Cassa Rosso
Exeptions bot will not delete:
- #redircts
- templates
Given that how large (in bytes) should a newly created article be written to be kept?
- Aside from templates and #redirects is there anything else lets say less than 30bytes a legitamate page?
objection to 15 limit - Please note: During debate on potential deletion of stub template redirects at WP:SFD and WP:TFD it is often easier - so as not to get the "this may be deleted" message on the real template - to replace the redirect message with a simple template message. Thus, if a redirect to {{a}} was being debated, the nominated template would contain the text {{tfd}}{{a}}. 12 characters. Then there's pages containing simply {{copyvio}} - 11 characters. I can't think of any smaller possibilities, but they may exist - and 15 is thus too big! Grutness...wha? 01:49, 22 September 2005 (UTC)
- Grutness, he already said he wouldn't delete templates. --Golbez 01:59, 22 September 2005 (UTC)
Will this bot automatically post an explaination to creator's talk pages explaining why their page was deleted? Perhaps it should, if that's feasible. --Aquillion 16:09, 26 September 2005 (UTC)
- First of all - good idea, go to it. (With the no templates, no #redirect's limits) 15 chars seems fine to me, we can expand it upward if we still get lots of slightly longer bad pages. I find it ironic that people have often cried that their articles were deleted automatically and we were always able to say - no, we don't delete things automatically, someone just was watching really fast and deleted it by hand. Now we won't be able to say that. It probably would be good to put a tiny note in the start-a-new-article MediaWiki text saying something like "Articles with less than 15 characters will be automatically deleted" - so we can point people there when they complain. In any case, good idea, go to it. JesseW, the juggling janitor 00:24, 4 October 2005 (UTC)
-
- Sounds good. Might I suggest also - if there wouldn't be one automatically - a 10 minute delay between the posting of the article and the automatic deletion, just in case the article was a valid one which glitched while being saved (as can happen with IE from time to time). Grutness...wha? 00:43, 4 October 2005 (UTC)
-
-
- Can be done. We can raise the cap a bit as JesseW suggested. What should be the cap? 42 bytes? ;)
- The bot will need "some" admin power at least enough to delete pages. Should we start a wikiproject? Its easy for me to code the bot, however we will have people complaining unless we have a page explaining this entier mess. --Cool Cat Talk 22:33, 7 October 2005 (UTC)
-
I have been requested by Cool Cat for his assistance, but in light that Cool Cat doesn't have admin powers, I have taken upon myself to write this bot. I have decided three levels for this bot: 15 bytes that are not redirects or templates will be deleted automatically by the bot. 42 bytes are automatically marked with {{db}} with the delete reason of "Bot detected new article less than 42 bytes, possibly spam." Anything under 1k will be marked with the generic stub template. --AllyUnion (talk) 23:57, 8 October 2005 (UTC)
- Does this include the 10 minute (or whatever) delay I suggested? I still think it would be useful... Grutness...wha? 00:13, 10 October 2005 (UTC)
Further discussion
- Just a note to state the obvious: Pages with edit histories, especially where any single version would not be eligible for deletion, should not be auto-deleted. -- Beland 03:02, 20 October 2005 (UTC)
- We have an existing stubsensor, though it may be inactive. Any new bot that's marking things as stubs should coordinate to make sure multiple bots or reports (not to mention WikiProject or random human editors) aren't working at cross-purposes. It might be worth it to look at the more sophisticated detection methods it uses, and the "lessoned learned" from that project. -- Beland 03:02, 20 October 2005 (UTC)
Bot submissions
Ok, so I'm not sure if this is the right place for this but I couldn't find any other talk pages devoted to Wikipedia Bots so here goes; if a bot grabs a page via the http://en.wikipedia.org/wiki/Special:Export method (both for ease-of-parsing, and to ease the load on the server), is there any way to submit its edits? By checking the page's source while editing it seems each submission must be accompanied by an opaque hash value. Is this actually enforced? Or is there something I'm missing here? Thanks in advance. porges 11:03, 13 October 2005 (UTC)
- You mean the wpEditToken? You can get that by loading the edit form; it's just there to detect edit conflicts. If you choose to use the special:export interface for getting the wikitext instead of ripping it out of the edit form (why?), make sure you first get an edit token and then load the text from special:export though, as otherwise there's a race condition where you could overwrite someone else's edits. --fvw* 12:03, 13 October 2005 (UTC)
- I think you need to fetch and repost wpEdittime, wpStarttime, and wpEditToken from the edit form, to prevent various kinds of problems. Some of those are for security, and some are to detect edit conflicts. It would save you from doing an additional HTTP transaction (and reduce server load, which is required of bots) if you also extracted the wikitext from the edit form. -- Beland 02:23, 20 October 2005 (UTC)
- It would be nice if the tokens needed to edit a page were provided via the export interface (maybe if provided with an extra parameter or something)... I don't need to be grabbing all the page formatting etc whenever the bot needs to grab a page. porges 23:23, 24 October 2005 (UTC)
- I think you need to fetch and repost wpEdittime, wpStarttime, and wpEditToken from the edit form, to prevent various kinds of problems. Some of those are for security, and some are to detect edit conflicts. It would save you from doing an additional HTTP transaction (and reduce server load, which is required of bots) if you also extracted the wikitext from the edit form. -- Beland 02:23, 20 October 2005 (UTC)
Notification Bot: cleaning up of Images with unknown....
I would like for NotificationBot to be allowed run to notify uploaders about their image in Category:Images with unknown source and Category:Images with unknown copyright status. I would also like to have the bot use two (not yet created) templates: {{No source notified}} & {{no license notified}}. The only difference in these two templates is, the following text will be added: The user has been automatically notified by a bot. Also, two new categories will be created: Category:Images with unknown source - notified & Category:Images with unknown copyright status - notified with the cooresponding templates. These templates will replace the ones on the image and the bot will sign a date on the page to indicate when it first notified the person. I would also like for the bot to give second notification on the 5th or 6th day before the 7 day period, then final notification on the 7th day. On the 8th day, it will change the image page and add a {{db}} with the reason of: "User already warned automatically 3 times about changing copyright information on image. Final notice was given yesterday, at the end of the 7 day period mark." This bot will run daily every midnight on UTC.
Oh, and the notification text will be something that looks like this:
First notice:
[[{{{1}}}|75px|center|]] |
|
Second notice:
[[{{{1}}}|75px|center|]] |
|
Third and final notice:
[[{{{1}}}|75px|center|]] |
|
--AllyUnion (talk) 11:34, 16 October 2005 (UTC)
- Hi, take a look at Template:No source since and see if that one will do you much good. I recently started using it. «»Who?¿?meta 11:36, 16 October 2005 (UTC)
- Sounds good. PS. Who, template sigs are evil. Can we get a bot to kill them? Alphax τεχ 11:40, 16 October 2005 (UTC)
- I think it would be great for a bot to automate these notifications as they are a real pain to do manually. I think 3 notices might be excessive though. Why not just have the first and only notice state the date on which the image will be scheduled for deletion unless the source/copyright info is supplied? I don't think cluttering up the uploader's talk page with 3 messages per each image is going to be at that effective. The 7 day notice is already provided on the upload page as well. RedWolf 17:50, 16 October 2005 (UTC)
Revised notice [modified further, see source link below]:
|
--AllyUnion (talk) 04:22, 17 October 2005 (UTC)
- Here is the source of the actual template: User:NotificationBot/Image source. --AllyUnion (talk) 04:30, 17 October 2005 (UTC)
- How will this bot handle the existing thousands of no-source images that are over seven days old? What about cases where someone has uploaded several dozen unsourced images? --Carnildo 06:56, 17 October 2005 (UTC)
-
- It goes through the above categories. Unfortunately, it would have to make several notifications, one after another. The other alternative is for me to program it far more extensively to have it operate on a flatfile database, collect all the information on what user needs to be notified of what, notify them and change all the images there after. That would be invariably more complex than it is now, and far more time consuming for me to write. As for anything existing no source image over seven days old, it won't mark a speedy on those. Though, it could, but the point of the bot is to give the uploader notification of 7 days, THEN mark it with a speedy deletion tag. --AllyUnion (talk) 13:09, 17 October 2005 (UTC)
- I agree one notice is plenty. Either they ignore it, they aren't editing this week, or they respond to the first message, and multiple messages will just be annoying if they're ignoring it or not around. -- Beland 02:17, 20 October 2005 (UTC)
- It would definitely be annoying for someone to continually be getting image-deletion notices on their talk page while they were trying to edit. People who upload lots of images (assuming they are worth keeping) are making important contributions, and it would be nice to keep them happy. At least on the first run, a simple compromise might be to just drop a line and say, "We noticed you have uploaded multiple images with unknown source or copyright status. To avoid future messages like the one previously posted, you may wish to review your contributions and identify sources and licensing." But this risks them not knowing which 2 out of the 20 they uploaded were problematic, or someone deleting those while the bot was waiting (perhaps another 7 days) to posted the threatened messages.
You could also just skip repeat customers on the first run, on the assumption that it will take a few days to run through all 20,000 or however many images need to be processed, and that a lot of people will probably check their other images after becoming aware that this might happen. I don't know which alternative is worse (annoyance or ignorance) but be prepared for complaints if you post more than two or three messages of the same kind to someone's user page in a relatively short timeframe. In the long run, will the bot check these categories every hour or every day or something? If it's hourly or thereabouts, I wouldn't worry about posting multiple messages. After the first one or two, they should get the idea, and stop uploading without attribution. It would be nice to batch-process, but I'd hate for that to delay implementation of this bot. Images are being deleted all the time, so people need to get notices ASAP. -- Beland 02:17, 20 October 2005 (UTC)
- Oh, and what criteria does the bot use for marking {{db}}? I would think that if anyone has edited either the image description page or the corresponding talk page since the notification, it should hold off and/or flag for manual review. And I presume responses to the bot owner's talk page would prevent an article being so flagged, if only through prompt manual intervention. -- Beland 03:23, 20 October 2005 (UTC)
Issues
There are some issues I'm trying to resolve. One of them is that the bot is over writing itself. My initial idea that I had for the project / program would that the bot would make a first pass on the category of images and move them into a notified category. After the 7 day period in the notified category, it would be presumed that the images can be deleted if they are still in that category. The problem now seems that I'd have to build up a list or category or a hash table based on repeat customers. A database may seem overkill, but it seems to me to be the most reasonable solution. We are talking about a great deal of images in that category that really need to be deleted and all the users need to be notified about them, even if they are not active no longer. This covers our butts from someone getting really pissed about an image deletion that they were not notified. It's more of a, "Yes, we did let you know, and you failed to do anything about it, so it's not our fault." More information on my new project board: User:AllyUnion/Project board --AllyUnion (talk) 08:55, 20 October 2005 (UTC)
- If you can do it, go for it. HereToHelp (talk) 02:35, 2 November 2005 (UTC)
301 redirects fixing
As part of the project Wikipedia:Dead_external_links I would like to fix 301 redirects. The bot will be run as User:KhiviBot.
This will be a manually assisted bot. It will be run using perl WWW-Mediawiki-Client-0.27. I believe cleaning up 301 redirects is a nice goal to have. Generating the list of url is a manual process since sometiimes the redirects might not be valid. Hence human intervention is needed to generate a list of url's. Once the url list is obtained then the bot can fix them.
Example of this is 114 instances of http://www.ex.ac.uk/trol/scol/ccleng.htm .
- Seems like a good plan, only thing that has me slightly worried is that there could be people abusing HTTP 301 for temporary name changes or round robinning or such. Barring any examples of this actually happening I'd support a test run though. --fvw* 16:18, 18 October 2005 (UTC)
-
- Sounds like a good idea, in general. The bot will need to retain the text which is displayed to the reader, if it already exists. It will also need to be excluded from talk pages of all namespaces. It might be prudent to exclude it from Wikipedia: space as well. There is one case where a simple replacement might do the wrong thing: if the URL is part of a citation that gives the date accessed. If the citation says "URL X, accessed on day A", then knowing both X and A, someone can find the document referenced on archive.org. The change we want to make is to "URL Y, accessed on day B", but a simple bot would do "URL Y, accessed on day A", which might not exist at archive.org. You might need to experiment a bit to find a heuristic to detect these cases and flag them for manual review. -- Beland 01:40, 20 October 2005 (UTC)
Bots and double redirects
I notice User:Kakashi Bot is fixing double redirects like A -> B -> C, it sounds like by just assuming that A should point to C. Looking at Special:DoubleRedirects, this is incorrect maybe 1% of the time. And many of the problems seem to occur with loops or bad human edits. Have we decided that any collateral damage here is acceptable, and in general, bots should just make this assumption? That would obviate the need for an entire project. Unless the bot could flag loops - that would be really handy. -- Beland 03:33, 20 October 2005 (UTC)
- I see there are about 991 now, split across two pages. If we think this is OK, another run or two could clear these up entirely...I wonder how fast they are being created. Theoretically, this Special page is updated periodically. Hopefully, it's on a cron job, and it would be possible to put a bot on a cron job that ran just after the Special update. -- Beland 03:50, 20 October 2005 (UTC)
- There have been some objections on Wikipedia_talk:Computer_help_desk/cleanup/double_redirects/20051009 to making the "change A->B->C" to A->C, B->C" assumption in the case where A is a misspelling of B. -- Beland 18:34, 22 October 2005 (UTC)
- By the way, the list at Special page is only 80% to 90% accurate. There are some inaccuracies due to the fact that are some false positives in the data, which I reported as a bug already. The logic behind Kakashi Bot that I eventually wrote Kakashi Bot on the following premises:
- If A equals C, then do nothing
- If A exists and C exists and A is a redirect page and A does not redirect to C and A redirects to B, we assume B redirects to C and therefore redirecting A to C is not harmful.
- If A's redirect target is not B, then do nothing and report the page as an error.
- Of course, it would be far more logical not to trust the special page, find all the redirects that exist and find which of them are double, triple, etc redirects then have a bot operate on the following logic:
- Premise: A is a double or more redirect.
- In a do-while loop fashion:
- do:
- if page X is redirect, add page X to queue Q, get the redirect target for page X, and set page X to the redirect target
- else terminate the loop
- (after the loop) Change all redirects in Q to point to X, which should be an article which all the redirects point to.
- That would technically solve all redirects. Of course, this is on the theory that all redirects in Q are valid moves and redirects. --AllyUnion (talk) 11:49, 25 October 2005 (UTC)
- Though I think this simply makes the A->B->C rule transitive, it may have trouble with redirect loops. But anyway... -- Beland 08:03, 26 October 2005 (UTC)
- By the way, the list at Special page is only 80% to 90% accurate. There are some inaccuracies due to the fact that are some false positives in the data, which I reported as a bug already. The logic behind Kakashi Bot that I eventually wrote Kakashi Bot on the following premises:
After pondering what "safe" assumptions would be, I propose the following:
- It's fine for redirect bots to assume that if A->B->C, A->C is OK, unless there's an exception
- Exceptions:
- There's any text on the page other than the redirect (such as a classification template)
- There's a link to a subsection (check for "#" in the link)
- A and C are the same page (A->B->A->B, etc.)
- When starting to edit A, B turns out to be D (someone unexpectedly edited A)
- Beware Bugzilla:3723
- Feature request Bugzilla:3747 is designed to reduce the number of double redirects created in the first place
- Previous discussions:
Then we'll see if there are any further complaints or problems. And yeah, querying a database directly would be great, though in the short run, I'm happy to have anything. Oh, and will someone be checking for double redirects that cannot be fixed automatically? It's fine with me if you want to just dump them into some category for random editors to fix. -- Beland 08:03, 26 October 2005 (UTC)