Re: wget archiving for dummies

From: Stuart Yeates <stuart.yeates_at_nyob>
Date: Mon, 6 Oct 2014 19:02:35 +0000
To: CODE4LIB_at_LISTSERV.ND.EDU
A number of others have suggested other approaches, but since you started with wget, here are the two wget commands I recently used to archive a wordpress-behind-exproxy site. The first logs into ezproxy and saves the login as a cookie. The second uses to cookie to access a site through exproxy 

wget  --no-check-certificate --keep-session-cookies  --save-cookies cookies.txt  --post-data 'user=yeatesst&pass=PASSWORD&auth=d1&url' 
https://login.EZPROXYMACHINE/login

wget --restrict-file-names=windows  --default-page=index.php -e robots=off  --mirror --user-agent="" --ignore-length --keep-session-cookies  --save-cookies cookies.txt --load-cookies cookies.txt --recursive  --page-requisites --convert-links --backup-converted "http://WORDPRESSMACHINE. EZPROXYMACHINE/BLOGNAME"

cheers
stuart


-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB_at_LISTSERV.ND.EDU] On Behalf Of Eric Phetteplace
Sent: Monday, 6 October 2014 7:44 p.m.
To: CODE4LIB_at_LISTSERV.ND.EDU
Subject: [CODE4LIB] wget archiving for dummies

Hey C4L,

If I wanted to archive a Wordpress site, how would I do so?

More elaborate: our library recently got a "donation" of a remote Wordpress site, sitting one directory below the root of a domain. I can tell from a cursory look it's a Wordpress site. We've never archived a website before and I don't need to do anything fancy, just download a workable copy as it presently exists. I've heard this can be as simple as:

wget -m $PATH_TO_SITE_ROOT

but that's not working as planned. Wget's convert links feature doesn't seem to be quite so simple; if I download the site, disable my network connection, then host locally, some 20 resources aren't available. Mostly images which are under the same directory. Possibly loaded via AJAX. Advice?

(Anticipated) pertinent advice: I shouldn't be doing this at all, we should outsource to Archive-It or similar, who actually know what they're doing.
Yes/no?

Best,
Eric
Received on Mon Oct 06 2014 - 15:07:08 EDT