|
|
| Seeking tool that download all files from internet directory |
|
Michael Faerber
Member #4,800
July 2004
|
Hi! I have found a specific site (http://fun.barnal.de/videos) and want to download all files from this page to view them offline. Question is: how? I tried it with HTTrack, but I can't resume the download after I have stopped it. Do you know a tool (must be available for Linux) that could do this for me? -- |
|
gnolam
Member #2,030
March 2002
|
Wget. -- |
|
kentl
Member #2,905
November 2002
|
If you use FireFox you could try the "Download them all" extension. I'm not sure about the name, it's popular so you'll find it. |
|
BAF
Member #2,981
December 2002
|
downTHEMall is the name of the extension... at least the one I have. |
|
Kitty Cat
Member #2,815
October 2002
|
man wget said:
o Retrieve the first two levels of wuarchive.wustl.edu, saving them to /tmp.
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
-- |
|
ReyBrujo
Moderator
January 2001
|
wget or httrack -- |
|
miran
Member #2,407
June 2002
|
I used to use httrack. It was teh awesome. -- |
|
Michael Faerber
Member #4,800
July 2004
|
Problem with wget is that I cannot resume the download after I have stopped it. HTTrack seemed to offer that option, but it seemed to random - sometimes it worked, sometimes it started redownloading the whole page again. DownThemAll however seems to work fine! So, if nobody proposes a better program, I will use this. -- |
|
gnolam
Member #2,030
March 2002
|
-c resumes partially downloaded files. -N makes sure only new files get downloaded. What's the problem? -- |
|
Evert
Member #794
November 2000
|
I second wget. |
|
Mars
Member #971
February 2001
|
"DownThemAll!" is really quite nice for in-Firefox use. -- |
|
Michael Faerber
Member #4,800
July 2004
|
Hey, gnolam, you really helped me with your "-c" option. I suppose I have to read the man pages more often. So I'll use wget now! Thanks for your help! -- |
|
jhuuskon
Member #302
April 2000
|
Now if only someone made a frontend for wget that would make it actually behave like it should. Wget is another fine example of opensource at its prime: Needlessly complicated, poorly documented and it doesn't work like it's supposed to. You don't deserve my sig. |
|
Evert
Member #794
November 2000
|
Quote: Now if only someone made a frontend for wget that would make it actually behave like it should. Wget is another fine example of opensource at its prime: Needlessly complicated, poorly documented and it doesn't work like it's supposed to.
Care to elaborate? |
|
jhuuskon
Member #302
April 2000
|
I tried numerous times to downlad an image gallery (a html page that links to the jpegs). However, it only downloads the index page and stops regardless of recursion options specified. Another funky thing, even when i tell wget to retain only donwloaded jpegs, it keeps the index even though i told it to retain only jpegs. The help file (yes i've tried it in windows) lists all options allright, but the explanations are arbitrary at best and the examples are, while well demonstrating the flexibility of Wget, totally useless from a practical point of view. You don't deserve my sig. |
|
Kitty Cat
Member #2,815
October 2002
|
man wget said:
o You have a file that contains the URLs you want to download? Use the -i
switch:
wget -i <file>
man wget also said:
-F
--force-html
When input is read from a file, force it to be treated as an HTML file. This
enables you to retrieve relative links from existing HTML files on your local
disk, by adding "<base href="url">" to HTML, or using the --base command-line
option.
-B URL
--base=URL
Prepends URL to relative links read from the file specified with the -i option.
If the images are all the same extension and in the same directory on the site: Quote:
o You want to download all the GIFs from a directory on an HTTP server. You
tried wget http://www.server.com/dir/*.gif, but that didn't work because HTTP
retrieval does not support globbing. In that case, use:
wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
More verbose, but the effect is the same. -r
That method won't work if the site has a robots.txt file set up, though. -- |
|
jhuuskon
Member #302
April 2000
|
Didn't you think i tried that? Just didn't work. I even forged the user agent and told it to ignore robots.txt but to no avail. You don't deserve my sig. |
|
|