m (Text replacement - "Category:Linux{{Navbox Linux}}" to "{{Navbox Linux}}Category:Linux")
m
 
Line 12: Line 12:
  
 
Note: This is equivalent to using curl with no options. Eg: {{code|<nowiki>curl http://google.ca</nowiki>}}.
 
Note: This is equivalent to using curl with no options. Eg: {{code|<nowiki>curl http://google.ca</nowiki>}}.
 +
 +
== Spoofing Gootlebot ==
 +
Some sites contain a paywall but can be viewed if spoofing Googlebot. To spoof Googlebot, pass in a Googlebot user agent and optionally a {{code|X-Forwarded-For}} header from a Googlebot IP address.
 +
 +
{{highlight|lang=terminal|code=
 +
$ wget --header="X-Forwarded-For: 66.249.66.1" --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
 +
}}
  
 
== Save to Directory ==
 
== Save to Directory ==

Latest revision as of 13:09, 11 January 2020

wget to console[edit | edit source]

Use the -O - option to output contents to the console.

$ wget -O - http://google.ca

Since the contents go to stdout, you can pipe it to a shell to execute remote commands:

$ wget -O - http://scripthost.local/fix_graphics.sh | sh

Note: This is equivalent to using curl with no options. Eg: curl http://google.ca.

Spoofing Gootlebot[edit | edit source]

Some sites contain a paywall but can be viewed if spoofing Googlebot. To spoof Googlebot, pass in a Googlebot user agent and optionally a X-Forwarded-For header from a Googlebot IP address.

$ wget --header="X-Forwarded-For: 66.249.66.1" --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Save to Directory[edit | edit source]

Use the -P to specify the prefix of the output file.

$ wget -P /tmp http://google.ca/


Saving an Entire Open Directory[edit | edit source]

To download an entire open directory from a remote server:

$ wget --no-check-certificate -np -nH -r -c --reject "index.html*" -e robots=off http://server/remote/dir

Where:

Flag Description
--no-check-certificate Don't check SSL certificates
-np No parent directory (don't crawl up)
-nH No host directory
--cut-dirs=1 Remove the 'remote/' directory from the download destination, 1 level.
-r Recursive
-c Continue
--reject "index.html*" Don't save index.html* files from directory listing
-e robots=off Don't check robots.txt

wget on FreeBSD[edit | edit source]

wget is available in ports under ftp/wget. If you don't want to compile it, you could just use fetch (1) to download files.

See Also[edit | edit source]