Download html page linux

Содержание

How to download a website page on Linux terminal?
Check if wget already available
Check if cURL already available
5 Linux Command Line Based Tools for Downloading Files and Browsing Websites
1. rTorrent
Installation of rTorrent in Linux
2. Wget
Installation of Wget in Linux
Basic Usage of Wget Command
3. cURL
Installation of cURL in Linux
Basic Usage of cURL Command
4. w3m
Installation of w3m in Linux
Basic Usage of w3m Command
5. Elinks
Installation of Elinks in Linux
Basic Usage of elinks Command
If You Appreciate What We Do Here On TecMint, You Should Consider:
How to Use the wget Linux Command to Download Web Pages and Files
Download directly from the Linux command line
What to Know
Features of the wget Command
How to Download a Website Using wget
Run wget as a Background Command
Logging
Download From Multiple Sites
Retry Options
Protect Download Limits
Get Through Security
Other Download Options
How to Download Certain File Types
Cliget
Summary

How to download a website page on Linux terminal?

The Linux command line provides greta features for web crawling in addition to its inherent capabilities to handle web servers and web browsing. In this article we will check for few tools which are wither available or can be installed and used in the Linux environment for offline web browsing. This is achieved by basically downloading the webpage or many webpages.

Wget is probably the most famous one among all the downloading options. It allows downloading from http, https, as well as FTP servers. It can download the entire website and also allows proxy browsing.

Below are the steps to get it installed and start using it.

Check if wget already available

Running the above code gives us the following result:

If the exit code($?) is 1 then we runt he below command to install wget.

Now we run the wget command for a specific webpage or a website to be downloaded.

Running the above code gives us the following result. We show the result only for the web page and not the whole website. Thee downloaded file gets saved in the current directory.

cURL is a client side application. It supports downloading files from http, https,FTP,FTPS, Telnet, IMAP etc. It has additional support for different types of downloads as compared to wget.

Below are the steps to get it installed and start using it.

Check if cURL already available

Running the above code gives us the following result:

The value of 1 indicates cURL is not available in the system. So we will install it using the below command.

Running the above code gives us the following result indicating the installation of cURL.

Next we user cURL to download a webpage.

Running the above code gives us the following result. You can locate the downloaded in the current working directory.

Источник

5 Linux Command Line Based Tools for Downloading Files and Browsing Websites

Linux command-line, the most adventurous and fascinating part of GNU/Linux is a very cool and powerful tool. A command-line itself is very productive and the availability of various inbuilt and third-party command-line applications makes Linux robust and powerful. The Linux Shell supports a variety of web applications of various kinds be it torrent downloader, dedicated downloader, or internet surfing.

5 Command-Line Internet Tools

Here we are presenting 5 great command line Internet tools, which are very useful and prove to be very handy in downloading files in Linux.

1. rTorrent

rTorrent is a text-based BitTorrent client which is written in C++ aimed at high performance. It is available for most of the standard Linux distributions including FreeBSD and Mac OS X.

Installation of rTorrent in Linux

Check if rtorrent is installed correctly by running the following command in the terminal.

rTorrent Command Line Tool

Functioning of rTorrent

Some of the useful Key-bindings and their use.

CTRL+ q – Quit rTorrent Application
CTRL+ s – Start Download
CTRL+ d – Stop an active Download or Remove an already stopped Download.
CTRL+ k – Stop and Close an active Download.
CTRL+ r – Hash Check a torrent before Upload/Download Begins.
CTRL+ q – When this key combination is executed twice, rTorrent shutdown without sending a stop Signal.
Left Arrow Key – Redirect to Previous screen.
Right Arrow Key – Redirect to Next Screen

2. Wget

Wget is a part of the GNU Project, the name is derived from World Wide Web (WWW). Wget is a brilliant tool that is useful for recursive download, offline viewing of HTML from a local Server and is available for most of the platforms be it Windows, Mac, Linux.

Wget makes it possible to download files over HTTP, HTTPS, and FTP. Moreover, it can be useful in mirroring the whole website as well as support for proxy browsing, pausing/resuming Downloads.

Installation of Wget in Linux

Wget being a GNU project comes bundled with Most of the Standard Linux Distributions and there is no need to download and install it separately. If in case, it’s not installed by default, you can still install it using apt, yum, or dnf.

Basic Usage of Wget Command

Download a single file using wget.

Download a whole website, recursively.

Download specific types of files (say pdf and png) from a website.

Wget is a wonderful tool that enables custom and filtered download even on a limited resource Machine. A screenshot of wget download, where we are mirroring a website (Yahoo.com).

Wget Command Line File Download

For more such wget download examples, read our article that shows 10 Wget Download Command Examples.

3. cURL

a cURL is a command-line tool for transferring data over a number of protocols. cURL is a client-side application that supports protocols like FTP, HTTP, FTPS, TFTP, TELNET, IMAP, POP3, etc.

cURL is a simple downloader that is different from wget in supporting LDAP, POP3 as compared to others. Moreover, Proxy Downloading, pausing download, resuming download are well supported in cURL.

Installation of cURL in Linux

By default, cURL is available in most of the distribution either in the repository or installed. if it’s not installed, just do an apt or yum to get a required package from the repository.

Basic Usage of cURL Command

For more such curl command examples, read our article that shows 15 Tips On How to Use ‘Curl’ Command in Linux.

4. w3m

The w3m is a text-based web browser released under GPL. W3m support tables, frames, color, SSL connection, and inline images. W3m is known for fast browsing.

Installation of w3m in Linux

Again w3m is available by default in most of the Linux Distribution. If in case, it is not available you can always apt or yum the required package.

Basic Usage of w3m Command

5. Elinks

Elinks is a free text-based web browser for Unix and Unix-based systems. Elinks support HTTP, HTTP Cookies and also support browsing scripts in Perl and Ruby.

Tab-based browsing is well supported. The best thing is that it supports Mouse, Display Colours, and supports a number of protocols like HTTP, FTP, SMB, Ipv4, and Ipv6.

Installation of Elinks in Linux

By default elinks also available in most Linux distributions. If not, install it via apt or yum.

Basic Usage of elinks Command

That’s all for now. I’ll be here again with an interesting article which you people will love to read. Till then stay tuned and connected to Tecmint and don’t forget to give your valuable feedback in the comment section.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

We are thankful for your never ending support.

Источник

How to Use the wget Linux Command to Download Web Pages and Files

Download directly from the Linux command line

What to Know

To download a full site, use the following command with the web address of the site: wget -r [site address]
To run wget as a background command use: wget -b [site address]

Features of the wget Command

You can download entire websites using wget and convert the links to point to local sources so that you can view a website offline. The wget utility also retries a download when the connection drops and resumes from where it left off, if possible when the connection returns.

Other features of wget are as follows:

Download files using HTTP, HTTPS, and FTP.
Resume downloads.
Convert absolute links in downloaded web pages to relative URLs so that websites can be viewed offline.
Supports HTTP proxies and cookies.
Supports persistent HTTP connections.
It can run in the background even when you aren’t logged on.
Works on Linux and Windows.

How to Download a Website Using wget

The wget utility downloads web pages, files, and images from the web using the Linux command line. You can use a single wget command to download from a site or set up an input file to download multiple files across multiple sites.

According to the manual page, wget can be used even when the user has logged out of the system. To do this, use the nohup command.

For this guide, you will learn how to download this Linux blog:

Before you begin, create a folder on your machine using the mkdir command, and then move into the folder using the cd command.

mkdir everydaylinuxuser
cd everydaylinuxuser
wget www.everydaylinuxuser.com

The result is a single index.html file that contains the content pulled from Google. The images and stylesheets are held on Google.

To download the full site and all the pages, use the following command:

wget -r www.everydaylinuxuser.com

This downloads the pages recursively up to a maximum of 5 levels deep. Five levels deep might not be enough to get everything from the site. Use the -l switch to set the number of levels you wish to go to, as follows:

wget -r -l10 www.everydaylinuxuser.com

If you want infinite recursion, use the following:

wget -r -l inf www.everydaylinuxuser.com

You can also replace the inf with 0, which means the same thing.

There is one more problem. You might get all the pages locally, but the links in the pages point to the original place. It isn’t possible to click locally between the links on the pages.

To get around this problem, use the -k switch to convert the links on the pages to point to the locally downloaded equivalent, as follows:

wget -r -k www.everydaylinuxuser.com

If you want to get a complete mirror of a website, use the following switch, which takes away the necessity for using the -r, -k, and -l switches.

wget -m www.everydaylinuxuser.com

If you have a website, you can make a complete backup using this one simple command.

Run wget as a Background Command

You can get wget to run as a background command leaving you able to get on with your work in the terminal window while the files download. Use the following command:

wget -b www.everydaylinuxuser.com

You can combine switches. To run the wget command in the background while mirroring the site, use the following command:

wget -b -m www.everydaylinuxuser.com

You can simplify this further, as follows:

wget -bm www.everydaylinuxuser.com

Logging

If you run the wget command in the background, you don’t see any of the normal messages it sends to the screen. To send those messages to a log file so that you can check on progress at any time, use the tail command.

To output information from the wget command to a log file, use the following command:

wget -o /path/to/mylogfile www.everydaylinuxuser.com

The reverse is to require no logging at all and no output to the screen. To omit all output, use the following command:

wget -q www.everydaylinuxuser.com

Download From Multiple Sites

You can set up an input file to download from many different sites. Open a file using your favorite editor or the cat command and list the sites or links to download from on each line of the file. Save the file, and then run the following wget command:

wget -i /path/to/inputfile

Apart from backing up your website or finding something to download to read offline, it is unlikely that you will want to download an entire website. You are more likely to download a single URL with images or download files such as zip files, ISO files, or image files.

With that in mind, you don’t have to type the following into the input file as it is time consuming:

http://www.myfileserver.com/file1.zip
http://www.myfileserver.com/file2.zip
http://www.myfileserver.com/file3.zip

If you know the base URL is the same, specify the following in the input file:

You can then provide the base URL as part of the wget command, as follows:

wget -B http://www.myfileserver.com -i /path/to/inputfile

Retry Options

If you set up a queue of files to download in an input file and you leave your computer running to download the files, the input file may become stuck while you’re away and retry to download the content. You can specify the number of retries using the following switch:

wget -t 10 -i /path/to/inputfile

Use the above command in conjunction with the -T switch to specify a timeout in seconds, as follows:

wget -t 10 -T 10 -i /path/to/inputfile

The above command will retry 10 times and connect for 10 seconds for each file link.

It is also inconvenient when you download 75% of a 4-gigabyte file on a slow broadband connection, only for the connection to drop. To use wget to retry from where it stopped downloading, use the following command:

wget -c www.myfileserver.com/file1.zip

If you hammer a server, the host might not like it and might block or kill your requests. You can specify a waiting period to specify how long to wait between each retrieval, as follows:

wget -w 60 -i /path/to/inputfile

The above command waits 60 seconds between each download. This is useful if you download many files from a single source.

Some web hosts might spot the frequency and block you. You can make the waiting period random to make it look like you aren’t using a program, as follows:

wget —random-wait -i /path/to/inputfile

Protect Download Limits

Many internet service providers apply download limits for broadband usage, especially for those who live outside of a city. You may want to add a quota so that you don’t go over your download limit. You can do that in the following way:

wget -q 100m -i /path/to/inputfile

The -q command won’t work with a single file. If you download a file that is 2 gigabytes in size, using -q 1000m doesn’t stop the file from downloading.

The quota is only applied when recursively downloading from a site or when using an input file.

Get Through Security

Some sites require you to log in to access the content you wish to download. Use the following switches to specify the username and password.

wget —user=yourusername —password=yourpassword

On a multi-user system, when someone runs the ps command, they can see your username and password.

Other Download Options

By default, the -r switch recursively downloads the content and creates directories as it goes. To get all the files to download to a single folder, use the following switch:

The opposite of this is to force the creation of directories, which can be achieved using the following command:

How to Download Certain File Types

If you want to download recursively from a site, but you only want to download a specific file type such as an MP3 or an image such as a PNG, use the following syntax:

The reverse of this is to ignore certain files. Perhaps you don’t want to download executables. In this case, use the following syntax:

Cliget

There is a Firefox add-on called cliget. To add this to Firefox:

Visit https://addons.mozilla.org/en-US/firefox/addon/cliget/ and click the add to Firefox button.

Click the install button when it appears, and then restart Firefox.

To use cliget, visit a page or file you wish to download and right-click. A context menu appears called cliget, and there are options to copy to wget and copy to curl.

Click the copy to wget option, open a terminal window, then right-click and choose paste. The appropriate wget command is pasted into the window.

This saves you from having to type the command yourself.

Summary

The wget command has several options and switches. To read the manual page for wget, type the following in a terminal window:

Источник