wget and cURL

wget

wget is a GNU utility for retrieving files over the web using the popular internet transfer protocols (HTTP, HTTPS, FTP). It’s useful either for obtaining individual files or mirroring entire web sites, as it can convert absolute links in downloaded documents to relative links. The GNU wget manual is the definitive resource.

Some Useful wget Switches and Options

Usage is wget option url1 url2

OptionPurpose
-A, -RAccept and reject lists; -A.jpg will download all .jpgs from remote directory
–backup-convertedWhen converting links in a file, back up the original version with .orig suffix; synonymous with -K
–backups=backupsBack up existing files with .1, .2, .3, etc. before overwriting them; the ‘backups’ directive specifies the maximum backups made for each file
-cContinue an interrupted download
–convert linksConvert links in downloaded files to point to local files
-i fileSpecify input file from which to read URLs
-l depthSpecify maximum recursion depth; default is 5
-mShortcut for mirroring options: -r -N -l inf –no-remove-listing, i.e., turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings
-NTurns on timestamping
-O fileSpecify the name of an output file, if you want it to be different than the downloaded file
-pDownload prerequisite files for displaying a web page (.css, .js, images, etc.)
-rDownload files recursively [RTFM here as it can get ugly fast]
-SPrint HTTP headers or FTP responses from remote servers
-T secondsSet a timeout for an operation that takes too long
–user=user
–password=password
Specify the username and/or password for HTTP/FTP logins

Some wget Examples

Basic file download:

$ wget http://ftp.gnu.org/gnu/wget/wget-1.16.3.tar.gz

Download a file, rename it locally:

$ wget -O file.tar.gz http://ftp.gnu.org/gnu/wget/wget-1.16.3.tar.gz

Download multiple files:

$ wget -O wget.tar.gz http://ftp.gnu.org/gnu/wget/wget-1.16.3.tar.gz ftp://ftp.gnu.org/gnu/wget/wget-1.16.2.tar.gz

Download a file with your HTTP or FTP login/pass encoded:

$ wget ftp://hniksic:mypassword@ftp.gnu.org/gnu/wget/wget-1.16.3.tar.gz

Retrieve a single web page and all its support files (css, images, etc.) and change the links to reference the downloaded files:

$ wget -p --convert-links http://tldp.org/index.html

Retrieve the first three levels of tldp.org, saving them to local directory tldp:

$ wget -r -l3 -P/tldp http://tldp.org/

Create a five levels-deep mirror of TLDP, keeping its directory structure, re-pointing the links to local files, saving the activity log to tldplog:

$ wget --convert-links -r http://tldp.org/ -o tldplog

Download all JPEGs from a from a given web directory, but not its child or parent directories:

$ wget -r -l1 --no-parent -A.jpg http://www.someserver.com/dir/

Mirror a site in the specified local directory, converting links for local viewing, backing up the original HTML files locally as *.orig before rewriting the links:

$ wget --mirror --convert-links --backup-converted http://tldp.org/ -O /home/me/tldplog

cURL

cURL is a free software utility for transferring data over a network. Although cURL can retrieve files over the web like wget, it speaks many more protocols (HTTP/S, FTP/S, SCP, LDAP, IMAP, POP, SMTP, SMB, Telnet, etc.), and it can both send and reliably read and interpret server commands. cURL can send HTTP POST headers to interact with HTML forms and buttons, for example, and if it receives a 3xx HTTP response (moved), cURL can follow the resource to its new location. Keep in mind that cURL thinks in terms of data streams, not necessarily in terms of tidy, human-readable files. The cURL manpage is the definitive resource.

Some cURL Examples

Get the index.html file from a web site, or save index.html locally with a specified file name, or save the file locally using its remote name:

$ curl http://tldp.org/
$ curl -o myfile.html http://tldp.org/
$ curl -O http://tldp.org/

FTP — get a particular file, or a directory listing:

$ curl ftp://ftp.supermicro.com/CDR-INTC_1.31_for_Intel_platform/NT40_README.txt
$ curl ftp://ftp.supermicro.com/CDR-INTC_1.31_for_Intel_platform/

Download a file from a web site that uses a redirect script, like Sourceforge (-L tells cURL to observe the Location header):

$ curl -o cdk.tar.gz -L http://sourceforge.net/projects/textstudio/files/latest/download?source=directory

Specify a port number:

$ curl http://tldp.org:6000

Specify a username and password:

$ curl ftp://name:passwd@supermicro.com:port/full/path/to/file 
-OR-
$ curl -u name:passwd ftp://supermicro.com:port/full/path/to/file

Get a file over SSH (scp) using an RSA key for password-less login:

$ curl -u username: --key ~/.ssh/id_rsa scp://myserver.com/~/file.txt

Get a file from a Samba server:

$ curl -u "domainusername:passwd" smb://server.myserver.com/share/file.txt

Send an email using Gmail:

$ curl --url "smtps://smtp.gmail.com:465" --ssl-reqd --mail-from "geoffstratton@gmail.com" 
   --mail-rcpt "someguy@example.com"  --upload-file mail.txt 
   --user "username@gmail.com:password" --insecure

Get a file using an HTTP proxy that requires login:

$ curl -u user:passwd -x some-proxy:888 http://tldp.org/

Get the first or last 500 bytes of a file:

$ curl -r 0-500 http://tldp.org/index.html
$ curl -r 500 http://tldp.org/index.html

Upload a file to an FTP server:

$ curl -T localfile.txt -u user:passwd ftp://ftp.amd.com/remotefile

Show the HTTP headers returned by a web server, or save them to a file:

$ curl -I http://www.debian.org/
$ curl -D headers.txt http://www.debian.org/

Send POST data to an HTTP server:

$ curl -d "name=Geoff%20Stratton&phone=5555555" http://www.someguyspage.com/guestbook.cgi

Emulate a fill-in form, using local file myfile.txt as the source for the ‘file’ field:

$ curl -F "file=@myfile.txt" -F "yourname=Geoff" -F "filedescription=Look at this!" http://www.post.com/postit.cgi

Set a custom referrer or user agent:

$ curl -e cia.gov http://www.somewebsite.com/
$ curl -A "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101" http://www.walmart.com/

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *