PDA

View Full Version : wget -A html,htm deletes downloaded files


RhinoBoy
03-12-2002, 10:43 AM
I just installed wget 1.8.1 and have had a terrible time configuring it. Most importantly I can't get the Accept command line switch or the .wgetrc "action" setting to work.

It seems to work with a single file suffix like html. When I add more suffixes, it downloads the file and then deletes it because it was rejected. I have tried all permutations I can think of:

-A html,htm,jpg,gif
-A .html,.htm,.jpg,.gif
-A “html,htm,jpg,gif”
-A ‘html,htm,jpg,gif’
-A “html”,”htm”,”jpg”,”gif”
-A ‘html’,’htm’,’jpg’,’gif’
etc., etc., etc

I also tried putting these in .wgetrc with the same result. BTW, I tried creating the .wgetrc file with BBEdit and wget never accessed it as far as I can tell (is there a way to tell?). I then copied the wgetrc file installed with wget and edited it with BBEdit and that worked. Is this a file/creator type issue?

Anyway, HELP. What am I missing here? I think wget holds the greatest promise for me for the way I like to download and read off line.

TIA, Dan

fireproof
03-12-2002, 11:49 AM
I'd guess that the .wgetrc file issue has more to do with permissions or line endings (LF vs CR -- check out BBEdit Lite for changing between DOS, Unix, and Mac line endings).

Also, for lines like -A ‘html’,’htm’,’jpg’,’gif’ perhaps you want to use the

`
(backtick) character at the beginning AND the end?

Just a thought.

RhinoBoy
03-12-2002, 12:14 PM
I created that post with a word processor. In reality, I tried permutations of the single quote ' and double quote ".

mervTormel
03-12-2002, 12:41 PM
i got this to work, but with some errors:

% wget -r -l1 --no-parent -A.jpg 'http://home.mindspring.com/~bduart/'

% ls -s home.mindspring.com/%7Ebduart/
total 36
16 moonIsAharshMistress.jpg 20 retroTVwTestPattern.jpg

RhinoBoy
03-12-2002, 04:36 PM
The problem isn't with a single accept keyword. I can't figure out how to get more than one to work. The following works fine:

wget -l1 --no-parent -A.html http://wopr.norad.org/articles/firewall/

This one below, with two suffixes, downloads the html file and then deletes it because it should be rejected.

wget -l1 --no-parent -A.html,.htm http://wopr.norad.org/articles/firewall/

15:33:36 (138.94 KB/s) - `wopr.norad.org/articles/firewall/basics.html' saved [1565/1565]

Removing wopr.norad.org/articles/firewall/basics.html since it should be rejected.