Talk:Web archiving

Libraries Low‑importance

	This article is within the scope of WikiProject Libraries, a collaborative effort to improve the coverage of Libraries on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LibrariesWikipedia:WikiProject LibrariesTemplate:WikiProject LibrariesLibraries articles
Low	This article has been rated as Low-importance on the project's importance scale.

Internet High‑importance

	Internet portal This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.InternetWikipedia:WikiProject InternetTemplate:WikiProject InternetInternet articles
High	This article has been rated as High-importance on the project's importance scale.

United States Low‑importance

	United States portal This article is within the scope of WikiProject United States, a collaborative effort to improve the coverage of topics relating to the United States of America on Wikipedia. If you would like to participate, please visit the project page, where you can join the ongoing discussions. Template Usage Articles Requested! Become a Member Project Talk Alerts United StatesWikipedia:WikiProject United StatesTemplate:WikiProject United StatesUnited States articles
Low	This article has been rated as Low-importance on the project's importance scale.

Digital Preservation

This article is within the scope of WikiProject Digital Preservation, a collaborative effort to improve the coverage of digital preservation on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Digital PreservationWikipedia:WikiProject Digital PreservationTemplate:WikiProject Digital PreservationDigital Preservation articles

Internet culture High‑importance

This article is within the scope of WikiProject Internet culture, a collaborative effort to improve the coverage of internet culture on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Internet cultureWikipedia:WikiProject Internet cultureTemplate:WikiProject Internet cultureInternet culture articles

High

This article has been rated as High-importance on the project's importance scale.

WikiProject Internet culture To-do:

Here are some tasks awaiting attention:

Article requests : View all requested internet culture articles
Expand : Pick an article from here or here
Photo : Category:Internet culture articles needing images
Stubs : All stubs are located here
Other : Category:Internet self-classification codes (!?); Try to get YouTube to FA; Tag all articles you find with {{WikiProject Internet culture}}
- See also Category:Internet culture articles needing attention

Library of Congress Low‑importance

	This article is within the scope of WikiProject Library of Congress, a collaborative effort to improve the coverage of the Library of Congress on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Library of CongressWikipedia:WikiProject Library of CongressTemplate:WikiProject Library of CongressLibrary of Congress articles
Low	This article has been rated as Low-importance on the importance scale.

Collections Care (inactive)

This article is within the scope of WikiProject Collections Care, a project which is currently considered to be inactive.Collections CareWikipedia:WikiProject Collections CareTemplate:WikiProject Collections CareCollections Care articles

Wiki Education Foundation-supported course assignment

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Jannymomoko.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 12:47, 17 January 2022 (UTC)[reply]

New category

There's no good topic category for Web archiving. This makes it hard to find some pages (eg file formats used for web archives) and has resulted in the set category Web archiving initiatives possibly being overused. I have created the category Web archiving as a child of Digital preservation and will leave it a few days to see if anyone objects to this before manually populating the category with the other pages I think belong in it Zosterae (talk) 15:30, 7 December 2015 (UTC)[reply]

Query: Archiving webpages produced by database queries.

It seems to be difficult presently to arrange for an archival copy of a webpage that is produced as a result of a database query. The issue comes up, for example, at the Internet Movie Database, when one makes a query for all films involving 2 collaborators; the page that is produced is not readily archived by on-demand services such as WebCite. This issue leads to 2 questions: does anyone know if there's a solution to the problem, or has anyone written about the problem so it can be noted in the present article? Easchiff (talk) 20:46, 18 January 2009 (UTC)[reply]

A page has to have a URL link of its own in order to be archived. If it doesn't, you obviously can't post or cite the link per se, much less archive it. If it's a short page with not too much information, sometimes a solution is to copy and paste the information somewhere, perhaps in a subpage on an article's or user's Talk page, if you just want to preserve the information for somewhat temporary future reference. But the thing is, if a webpage doesn't have its own URL, then it likely isn't anything that would be used on Wikipedia as a citiation or External Link anyway. Softlavender (talk) 09:48, 18 July 2009 (UTC)[reply]

Archive blocking

Blocking the archival of TOS and privacy policies seems notable to me. Any thoughts on whether the reasons in the edit summary of this edit make it meritorious? It's mine and was just undone. --Elvey (talk) 21:24, 28 June 2010 (UTC)[reply]

No answer; will attempt a compromise edit. (Is this another (less interesting) example of archive blocking: http://forums.wireless.att.com/user/viewprofilepage/user-id/2343207 vs http://www.webcitation.org/query?url=http%3A%2F%2Fforums.wireless.att.com%2Fuser%2Fviewprofilepage%2Fuser-id%2F2343207&date=2010-07-03 ? Forbidden(403) is not the same as Page Not Found(404) but I suppose this could be a WebCite bug. http://forums.wireless.att.com/t5/user/viewprofilepage/user-id/2343207 works; I suspect this is irrelevant, but only WebCite staff has the access necessary to really answer this one.) --Elvey (talk) 18:20, 3 July 2010 (UTC)[reply]

It seems more folks are preventing/blocking the archival of TOS and privacy policies. I just tried to archive the Merrill Lynch Brokerage Website Terms and Conditions as of June 18, 2010 (the date they were last changed), and not only was I unsuccessful, it triggered the locking of my account! Spent over 40 minutes on the phone getting the account unlocked, and they also helped me navigate to a PDF of the terms and conditions, but they block webcite.org from archiving it; here's the archive attempt, which also shows the full URL: http://www.webcitation.org/5stWIeKa4 . They'll look into it and get back to me; it'll be interesting to see if anything changes. It's not archived by google. (http://www.google.com/search?q=site:ml.com+%22brokerage+website+terms+and+conditions%22) I'm trying to add the URL to google's index. I just successfully added it to the list of URLs google intends to crawl...someday. I will be very surprised if google archives it, as that requires ML to treat Google differently from WebCite, and for google to get around to doing the crawling, and to choose to index and archive the PDF. (http://www.google.com/addurl/) Merrill is happy to serve, and WebCite is happy to archive, other PDFs, e.g.: http://www.webcitation.org/www.ml.com/media/86941.pdf The website's search feature only finds the T&C if the search is done by a logged in user; the result is hidden when the search is done otherwise. --Elvey (talk) 22:48, 20 September 2010 (UTC)[reply]

Web archiving#On Demand

Aside from marketing jargon, the commercial services are functionally identical. Some are on-demand, some offer scheduled backup services. IMHO they should all be listed with identical common terminology. --Lexein (talk) 19:33, 7 July 2010 (UTC)[reply]

BackupURL.com Blacklisted?

Why is BackupURL.com blacklisted? Is it merely because it gets cited often in references?

I looked at the blacklist page, but it's pretty confusing:

http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spam/LinkReports/backupurl.com

As I understand it, all backupurl.com is, is a web archive. Sempi (talk) 04:58, 1 November 2011 (UTC)[reply]

WARC Tools

Following the link in the article, it appears that WARC Tools has been bought out by Symantec? I don't see any source code or downloads listed anymore, except for .pdf Sempi (talk) 05:45, 1 November 2011 (UTC)[reply]

Searh Tool Forbidden Site

In this section, Search Tool of Google Code is listed, but, it can not be accessed. --Tito Dutta (Send me a message) 22:57, 29 January 2012 (UTC)[reply]

Big list of enterprise and subscription services

Do we need Web_archiving#Enterprise_and_subscription_services section ?

It is big list of the enterpise services which are very expensive (for example PageFreezer subscription costs $50.000/year) and have no version open for public use.

Perma.CC and Wikipedia

One question - perma.cc requires that a link be used in a published journal and verified before being stored permanently. Will it "verify" links being cited in Wikipedia articles if submitted? We don't want to lose the content of one of the best open source "journals" in the world.Mdawn (talk) 16:55, 29 September 2013 (UTC)[reply]

Transactional Archiving of Remote Sites

Why is this stated to be impossible? All you need is an remote controlled browser that cannot bypass the intercepting proxy. See www.icanprove.com.

91.12.26.8 (talk) 11:18, 26 January 2014 (UTC)[reply]

Archive.today finds pages that have the same content

Cool example: http://www.archive.today/scr:4ed261b531c9b7d37ccfae3738c0bf4f48cffee1^{[dead link]} (who's Nathan?)--Elvey^(t•c) 21:59, 15 November 2014 (UTC)[reply]

Abandoned revision

User:Jannymomoko/sandbox is an abandoned user draft of this page. Please would an interested editor assess the material added there, incorporate what is useful into the live article, and leave a note here when that is done? – Fayenatic London 09:11, 26 July 2020 (UTC)[reply]

External links

There are six entries in the "External links". Three seems to be an acceptable number and of course, everyone has their favorite to add for four. The problem is that none is needed for article promotion.

ELpoints #3) states: Links in the "External links" section should be kept to a minimum. A lack of external links or a small number of external links is not a reason to add external links.
LINKFARM states: There is nothing wrong with adding one or more useful content-relevant links to the external links section of an article; however, excessive lists can dwarf articles and detract from the purpose of Wikipedia. On articles about topics with many fansites, for example, including a link to one major fansite may be appropriate.
ELMIN: Minimize the number of links.
ELCITE: access dates are not appropriate in the external links section. Do not use {{cite web}} or other citation templates in the External links section. Citation templates are permitted in the Further reading section.
ELBURDEN: Disputed links should be excluded by default unless and until there is a consensus to include them.

All down

All three web archive services I routinely use (archive.org, archive.today, and ghostarchive.org) are down right now. Tuvalkin (talk) 16:51, 27 May 2024 (UTC)[reply]