Curl Versus Wget

Curl Versus Wget

What is Curl? Curl is short for Client for URLs

  • It is a Unix command line tool

  • transfers data to and from a server

  • Is used to download data from HTTP(S) sites and FTP servers

checking curl installation

type man curl in the command line. if curl is not installed you will get curl was not found. Go to curl download for full instruction. If curl is installed, you get;

Screenshot 2022-08-07 at 10.18.36.png

Press Enter to scroll. To exit and return to your console press q

Basic syntax in curl

curl [option flag] [url] It is important to add the url for the command to be executed successfully.

check out curl --help for a full list of options available on curl.

Screenshot 2022-08-07 at 10.24.11.png

How to download a single file

A single file is stored at :

https://websitename.com/datafilename.txt

Use the optional file -O to save the file with its original name.

curl -O https://websitename.com/datafilename.txt

To rename the file, use the lowercase -o + the file name.

curl -o newfilename.txt https://websitename.com/datafilename.txt

How to download multiple files

To download every file hosted on websitename.com that starts with datafilename and ends with .txt

curl -O https://websitename.com/datafilename*.txt

or, you could use the globbing parser

curl -O https://websitename.com/datafilename[001-100].txt

download the file name suffix 0 to 100.

Or

curl -O https://websitename.com/datafilename[001-100:10].txt

To download the 10th file on the server.

Preemptive Troubleshooting

curl has two particularly useful option flags in case of timeouts during download:

  • -L redirects the HTTP URL if a 300 error occurs.

  • -C resumes a previous file transfer if it times out before completion.

  • All flags come before the URL but the order of the flag does not matter.

# Download all 100 data files
curl -O https://s3.amazonaws.com/assets.datacamp.com/production/repositories/4180/datasets/files/datafile[001-100].txt
# Download and rename redirected file 
curl -o Spotify201812.zip -L https://assets.datacamp.com/production/repositories/4180/datasets/eb1d6a36fa3039e4e00064797e1a1600d267b135/201812SpotifyData.zip

Download data using wget

What is Wget?

  • It derives its name from World Wide Web and get.

  • It is native to linux but compatible for all operating system.

  • It is used to download files from HTTP(S) and FTP.

  • better than curl at downloading multiple files recursively.

Checking Wget Installation

To check if Wget is installed:

which wget

this will return the location in which wget is installed.

Screenshot 2022-08-07 at 10.55.40.png

if wget has not being installed , there will be no output.

Wget installation by Operating System

wget source code: https://www.gnu.org/software/wget/

linux users: **sudo apt-get install wget ** on the the command line

MacOS: use homebrew and run brew install wget

Windows: download via gnuwin32

Browsing the Wget manual

Once the installation is complete, use the man wget to print the manual.

Wget Syntax

wget [option flag] [URL]

For a full list of wget options, refer to wget --help

Screenshot 2022-08-07 at 11.07.36.png

Downloading a Single file

option flags unique to wget:

  • -b: go to background immediately after startup

  • -q: turns off the wget output

  • -c: resumes broken download (i.e continue getting a partially downloaded file)

You can use all flags during download:

wget -bqc https://websitename.com/datafilename.txt
# Preview the log file 
cat wget-log

Multiple file downloading with Wget

Save a list of file location in a txt file cat urlist.txt

wget -i urlist.txt

To resume a partially downloaded file, use a -c switch in your command as follows:

to download the files listed in the filename. It is important to not add option file between -i and the filename.

To resume a partially downloaded file, use a -c switch in your command as follows:

wget -c URL

To make your wget download silent, add the -q switch to your initial wget command:

wget -q URL

##Setting download constraints for large files Set upper download bandwidth limit (by default in bytes per second) with --limit-rate .

Syntax

wget --limit-rate={rate}k {file location}

Example

wget --limit-rate=200k -i urllist.txt

##Setting download constraints for small files

Set a mandatory pause time (in seconds) between file downloads with --wait

Syntax

wget --wait={Seconds} {file location}

Example

wget --wait=2.5 -i urlist.txt

#curl versus wget

curl advantages

  • curl be used for downloading and uploading files from 20+ protocols

  • easier to install across Operating Systems

wget advantages

  • has many built-in-functionalities for handling multiple file downloads

  • can handle various file formats for download (e.g directory HTML page)