What is Curl? Curl is short for Client for URLs
It is a Unix command line tool
transfers data to and from a server
Is used to download data from HTTP(S) sites and FTP servers
checking curl installation
type man curl in the command line. if curl is not installed you will get curl was not found. Go to curl download for full instruction. If curl is installed, you get;
Press Enter to scroll. To exit and return to your console press q
Basic syntax in curl
curl [option flag] [url] It is important to add the url for the command to be executed successfully.
check out curl --help for a full list of options available on curl.
How to download a single file
A single file is stored at :
https://websitename.com/datafilename.txt
Use the optional file -O to save the file with its original name.
curl -O https://websitename.com/datafilename.txt
To rename the file, use the lowercase -o + the file name.
curl -o newfilename.txt https://websitename.com/datafilename.txt
How to download multiple files
To download every file hosted on websitename.com that starts with datafilename and ends with .txt
curl -O https://websitename.com/datafilename*.txt
or, you could use the globbing parser
curl -O https://websitename.com/datafilename[001-100].txt
download the file name suffix 0 to 100.
Or
curl -O https://websitename.com/datafilename[001-100:10].txt
To download the 10th file on the server.
Preemptive Troubleshooting
curl has two particularly useful option flags in case of timeouts during download:
-L redirects the HTTP URL if a 300 error occurs.
-C resumes a previous file transfer if it times out before completion.
All flags come before the URL but the order of the flag does not matter.
# Download all 100 data files
curl -O https://s3.amazonaws.com/assets.datacamp.com/production/repositories/4180/datasets/files/datafile[001-100].txt
# Download and rename redirected file
curl -o Spotify201812.zip -L https://assets.datacamp.com/production/repositories/4180/datasets/eb1d6a36fa3039e4e00064797e1a1600d267b135/201812SpotifyData.zip
Download data using wget
What is Wget?
It derives its name from World Wide Web and get.
It is native to linux but compatible for all operating system.
It is used to download files from HTTP(S) and FTP.
better than curl at downloading multiple files recursively.
Checking Wget Installation
To check if Wget is installed:
which wget
this will return the location in which wget is installed.
if wget has not being installed , there will be no output.
Wget installation by Operating System
wget source code: https://www.gnu.org/software/wget/
linux users: **sudo apt-get install wget ** on the the command line
MacOS: use homebrew and run brew install wget
Windows: download via gnuwin32
Browsing the Wget manual
Once the installation is complete, use the man wget to print the manual.
Wget Syntax
wget [option flag] [URL]
For a full list of wget options, refer to wget --help
Downloading a Single file
option flags unique to wget:
-b: go to background immediately after startup
-q: turns off the wget output
-c: resumes broken download (i.e continue getting a partially downloaded file)
You can use all flags during download:
wget -bqc https://websitename.com/datafilename.txt
# Preview the log file
cat wget-log
Multiple file downloading with Wget
Save a list of file location in a txt file cat urlist.txt
wget -i urlist.txt
To resume a partially downloaded file, use a -c
switch in your command as follows:
to download the files listed in the filename. It is important to not add option file between -i and the filename.
To resume a partially downloaded file, use a -c
switch in your command as follows:
wget -c URL
To make your wget download silent, add the -q
switch to your initial wget command:
wget -q URL
##Setting download constraints for large files Set upper download bandwidth limit (by default in bytes per second) with --limit-rate .
Syntax
wget --limit-rate={rate}k {file location}
Example
wget --limit-rate=200k -i urllist.txt
##Setting download constraints for small files
Set a mandatory pause time (in seconds) between file downloads with --wait
Syntax
wget --wait={Seconds} {file location}
Example
wget --wait=2.5 -i urlist.txt
#curl versus wget
curl advantages
curl be used for downloading and uploading files from 20+ protocols
easier to install across Operating Systems
wget advantages
has many built-in-functionalities for handling multiple file downloads
can handle various file formats for download (e.g directory HTML page)