Wikipedia:Dead external links
From Wikipedia, the free encyclopedia
Active Wiki Fixup Projects |
---|
Leading the charge in the War on Error! Must be active, systematic, have lists, & need help. |
Writing |
Dead-end pages These pages are not wikified. |
Neglected articles Most wanted stubs Shortpages (Updated from January 25 2006 dump, all still active as of April 2006) |
Most wanted articles Articles requested for more than a year |
Missing articles Wikipedia is not as complete as you might think! |
Other |
Uncategorised articles Help categorise articles. |
Orphaned articles Help link to these orphaned articles. |
Stubsensor Help remove stub tags from articles they don't belong on. |
Articles that need to be wikified Massive backlog. |
Orphaned talk pages Talk pages without corresponding articles. Administrators only. |
Disambiguation pages with links (Updated September 6 2006) |
Templates with red links (Updated January 25 2006) |
Linkrot Fix broken links to external websites. |
Transwiki log cleanup Articles that have been transwikied and need to be checked for possible merging or deletion. |
Main - Inactive - Mini |
Like almost all large websites, Wikipedia also suffers from the phenomenon known as link rot, where external links go stale after a period of time. As of the December 14th, 2005 database dump, Wikipedia contained 1,120,609 external links, many of which are no longer functioning.
Such dead links are unwanted, and should be fixed on a regular basis. You can either try to find the current location of the document using a Google inurl search, or use the {{dlw}} or {{dlw-inline}} templates to point to the Internet Archive version of the document, like this: {{dlw|dead URL|caption}}, e.g.: {{dlw|http://free.oszoo.org/|OS Zoo webpage}}. For dead links inside paragraphs, use {{dlw-inline|url=dead URL|title=caption}}, which will not disturb the flow of text as much.
Please do not simply remove every dead link; they often contain valuable information.
This page is intended to be a clearinghouse for all such external links. If you make corrections to the source article to fix a broken link, please indicate so below to prevent a duplication of effort.
Although the sections below contain a short description of the status code in question, please see the list of HTTP status codes for a more complete description.
Contents |
[edit] Status codes
[edit] 200
The 200 status code indicates that the link is correctly formed, and retrievable. Although such links do not need correction, they are included here for completeness. Wikipedia currently contains 944,509 of these links. Due to the sheer number of links that correctly resolve, these are not available for download.
[edit] 300
Indicates that the website requested more information from the bot so that it could make an appropriate presentation of the content. Although such links are most likely correct, they should probably be double checked. Wikipedia currently contains 76 of these links.
[edit] 301
Indicates that the content has been moved permanently, and that the link inside Wikipedia should probably be updated to reflect the new location. Although this should not be changed for all sites as some sites use 301 redirects to redirect pages that change their destination often. Wikipedia currently contains 30,427 of these links.
[edit] 302, 303, 307
Indicates that the content has been temporarily moved, and that the client should continue to use the original link. Although these links should be correct in theory, they are often used by link farms, and should probably be checked. Wikipedia currently contains 60,355 status 302 links, 158 status 303 links, and 8 status 307 links.
[edit] 400
Indicates that the site in question could not understand the bot's request. Although these should hopefully diminish with future revisions of the bot, it may be useful to test them, anyway (low priority). Wikipedia currently contains 1,604 of these links. Note: links with anchors and HTML entities should be ignored (see talk page); without these, only 185 links are actually broken.
[edit] 401
The page required authorization, which the bot does not support. The page in question may have included login information, the bot has no way of knowing this. Such links should be fixed if the page does not contain login information. Wikipedia currently contains 262 of these links.
[edit] 402
Although not an active status code, the servers used it, anyway. It indicates that the server requested payment (in theory) from the client. Such links should be fixed. Wikipedia currently contains 0 of these links.
[edit] 404, 410
The 404 error is the most common symptom of link rot, and it indicates that the page has not been found. The 410 status code is similar, but indicates that the file has permanently gone. Such links are required by policy to be repaired, perhaps with a link to the Internet Archive. Wikipedia currently contains 31,913 status 404 links and 42 status 410 links.
[edit] 406
Occurs for a number of reasons, indicates that the client request was unacceptable in some manner. Should probably be fixed. Wikipedia currently contains 94 of these links.
Update In the gzip file downloaded 24 Feb 2006 the count was 261.
[edit] 409
Indicates some sort of error that the client needs to resolve. Should probably be fixed. Wikipedia currently contains 0 of these links.
[edit] 423
Although not an active status code, servers use it to indicate some sort of "Locked" error. Should probably be fixed. Wikipedia currently contains 3 of these links.
[edit] 425
Another non-active status code. Although the bot was not mirroring their content, it indicates that the server denied the request due to it being a "mirroring" request. Should probably be tested. Wikipedia currently contains 31 of these links.
[edit] 5xx
Indicates there was some sort of internal server error. This could be the result of a malformed bot HTTP request, or numerous other reasons. Should be examined to determine whether the site is suffering from some sort of permanent problem with the link in question. Wikipedia currently contains 5,911 status 500 links, 11 status 501 links, 176 status 502 links, and 443 status 503 links.
[edit] NA - Unsupported protocol
Indicates that the link was used a protocol such as IRC, Gopher, etc. that the bot is not capable of resolving. Should be checked as to whether the resource type is correct (eg, htttp://www.wikipedia.org). Wikipedia currently contains 212 of these links.
[edit] NA - Unknown error
Indicates that the bot had some sort of difficulty resolving the link in question. Could be caused by a number of errors: DNS lookup failures, socket timeouts, etc. The default socket timeout was set to 30 seconds, which may be too low for some very slow sites. Should probably be tested. Wikipedia currently contains 40,472 of these links.
[edit] Downloads
Below are links to download tab separated text files (gzip compressed) containing the links. They are in the form:
Article title, tab, URL, tab, further description (as in [http://www.wikipedia.org/ Wikipedia] links), tab, error code, tab, server response. These should probably be located to somewhere more permanent in the future.
200 (not currently available)
400 - 401 - 402 - 404 - 406 - 409 - 410 - 423 - 425
NA (Unsupported protocol) (Checked) - NA (Unknown error)
The 404 errors have pages to themselves (note: These have now been updated to reflect the December 14th update. Apologies to those who worked recently. You may need to restrike your fixes):
- misc, 751 entries
- a, 2309 entries
- b, 1645 entries
- c, 2236 entries
- d, 1380 entries
- e, 1020 entries
- f, 1115 entries
- g, 1111 entries
- h, 1283 entries
- i, 938 entries
- j, 1779 entries
- k, 708 entries
- l, 1883 entries
- m, 2555 entries
- n, 1236 entries
- o, 754 entries
- p, 1215 entries
q, 84 entries- r, 1283 entries
- s, 2536 entries
- t, 1897 entries
- u, 554 entries
- v, 419 entries
- w, 912 entries
x, 41 entriesy, 157 entriesz, 120 entries
[edit] Status
Please indicate your correction status in the form "123: ABC - XYZ", eg, "404: African Academy of Sciences - anonymous remailer"
Important Notice: All status entries have been reset as of January 3rd, to reflect the new broken link files.
300: All done!
301: None
302: None
303: None
307: All done!
400: None
401: None
402: Nothing to fix
404: All of q;All of x;All ofz;
406: 2004-2004 (7 links across 3 pages)
409: All done (there's just 1 link and it's OK)
410: All done (except links to http://today.reuters.com/)
423: All done!
425: All of these linked to the same website, which worked fine, and thus no changes were required.
500: None
501: All done!
502: None
503: None
NA (Unsupported protocol): All done (irc|mailto|news|gopher|rtsp|telnet|nntp|pnm|worldwind ignored)
NA (Unknown error): None