Wget

From Leo's Notes
Last edited on 30 December 2021, at 01:36.

wget to console

Use the -O - option to output contents to the console.

$ wget -O - http://google.ca

Since the contents go to stdout, you can pipe it to a shell to execute remote commands:

$ wget -O - http://scripthost.local/fix_graphics.sh | sh

Note: This is equivalent to using curl with no options. Eg: curl http://google.ca.

Spoofing Gootlebot

Some sites contain a paywall but can be viewed if spoofing Googlebot. To spoof Googlebot, pass in a Googlebot user agent and optionally a X-Forwarded-For header from a Googlebot IP address.

$ wget --header="X-Forwarded-For: 66.249.66.1" --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Save to Directory

Use the -P to specify the prefix of the output file.

$ wget -P /tmp http://google.ca/


Saving an Entire Open Directory

To download an entire open directory from a remote server:

$ wget --no-check-certificate -np -nH -r -c --reject "index.html*" -e robots=off http://server/remote/dir

Where:

Flag Description
--no-check-certificate Don't check SSL certificates
-np No parent directory (don't crawl up)
-nH No host directory
--cut-dirs=1 Remove the 'remote/' directory from the download destination, 1 level.
-r Recursive
-c Continue
--reject "index.html*" Don't save index.html* files from directory listing
-e robots=off Don't check robots.txt

wget on FreeBSD

wget is available in ports under ftp/wget. If you don't want to compile it, you could just use fetch (1) to download files.

See Also