wget

wget介绍

wget是由c语言写的网络下载工具,支持HTTP和FTP协议,支持代理服务器和断点续传功能,能够自动递归远程主机的目录。可后台运行。

wget的使用

下载

用法:wget [OPTION]... [URL]...   (wget 软件包下载网址    如:wget http://example.com/packages/)如需添加选项操作,可参考参数说明。

参数

较为常用的参数是 -t:设定重试次数(0表示无限次,即不间断下载),-b:后台模式(将下载行为放到后台),-c:继续下载(断网导致的中断下载。加上该参数,即可断点续传) ,-P/directory :将文件下载到指定目录,-i filelist.txt:批量下载文件中的内容。

1、下载192.168.1.168首页并且显示下载信息

wget -d http://192.168.1.168

2、下载192.168.1.168首页并且不显示任何信息

wget -q http://192.168.1.168

3、批量下载的情形,把所有需要下载文件的地址放到 filename.txt 中,然后 wget 就会自动为你下载所有文件了。

wget -i filelist.txt

4、下载到指定目录

wget -P/tmp ftp://user:passwd@url/file

把文件file下载到/tmp目录下。

Startup:

  -V,  --version                  display the version of Wget and exit

  -h,  --help                      print this help

  -b,  --background                go to background after startup

  -e,  --execute=COMMAND          execute a `.wgetrc'-style command

Logging and input file:

  -o,  --output-file=FILE          log messages to FILE

  -a,  --append-output=FILE        append messages to FILE

  -d,  --debug                    print lots of debugging information

  -q,  --quiet                    quiet (no output)

  -v,  --verbose                  be verbose (this is the default)

  -nv, --no-verbose                turn off verboseness, without being quiet

      --report-speed=TYPE        output bandwidth as TYPE.  TYPE can be bits

  -i,  --input-file=FILE          download URLs found in local or external FILE

  -F,  --force-html                treat input file as HTML

  -B,  --base=URL                  resolves HTML input-file links (-i -F)

                                    relative to URL

      --config=FILE              specify config file to use

      --no-config                do not read any config file

      --rejected-log=FILE        log reasons for URL rejection to FILE

Download:

  -t,  --tries=NUMBER              set number of retries to NUMBER (0 unlimits)

      --retry-connrefused        retry even if connection is refused

      --retry-on-http-error=ERRORS    comma-separated list of HTTP errors to retry

  -O,  --output-document=FILE      write documents to FILE

  -nc, --no-clobber                skip downloads that would download to

                                    existing files (overwriting them)

      --no-netrc                  don't try to obtain credentials from .netrc

  -c,  --continue                  resume getting a partially-downloaded file

      --start-pos=OFFSET          start downloading from zero-based position OFFSET

      --progress=TYPE            select progress gauge type

      --show-progress            display the progress bar in any verbosity mode

  -N,  --timestamping              don't re-retrieve files unless newer than

                                    local

      --no-if-modified-since      don't use conditional if-modified-since get

                                    requests in timestamping mode

      --no-use-server-timestamps  don't set the local file's timestamp by

                                    the one on the server

  -S,  --server-response          print server response

      --spider                    don't download anything

  -T,  --timeout=SECONDS          set all timeout values to SECONDS

      --dns-timeout=SECS          set the DNS lookup timeout to SECS

      --connect-timeout=SECS      set the connect timeout to SECS

      --read-timeout=SECS        set the read timeout to SECS

  -w,  --wait=SECONDS              wait SECONDS between retrievals

      --waitretry=SECONDS        wait 1..SECONDS between retries of a retrieval

      --random-wait              wait from 0.5*WAIT...1.5*WAIT secs between retrievals

      --no-proxy                  explicitly turn off proxy

  -Q,  --quota=NUMBER              set retrieval quota to NUMBER

      --bind-address=ADDRESS      bind to ADDRESS (hostname or IP) on local host

      --limit-rate=RATE          limit download rate to RATE

      --no-dns-cache              disable caching DNS lookups

      --restrict-file-names=OS    restrict chars in file names to ones OS allows

      --ignore-case              ignore case when matching files/directories

  -4,  --inet4-only                connect only to IPv4 addresses

  -6,  --inet6-only                connect only to IPv6 addresses

      --prefer-family=FAMILY      connect first to addresses of specified family,

                                    one of IPv6, IPv4, or none

      --user=USER                set both ftp and http user to USER

      --password=PASS            set both ftp and http password to PASS

      --ask-password              prompt for passwords

      --use-askpass=COMMAND      specify credential handler for requesting

                                    username and password.  If no COMMAND is

                                    specified the WGET_ASKPASS or the SSH_ASKPASS

                                    environment variable is used.

      --no-iri                    turn off IRI support

      --local-encoding=ENC        use ENC as the local encoding for IRIs

      --remote-encoding=ENC      use ENC as the default remote encoding

      --unlink                    remove file before clobber

      --xattr                    turn on storage of metadata in extended file attributes

Directories:

  -nd, --no-directories            don't create directories

  -x,  --force-directories        force creation of directories

  -nH, --no-host-directories      don't create host directories

      --protocol-directories      use protocol name in directories

  -P,  --directory-prefix=PREFIX  save files to PREFIX/..

      --cut-dirs=NUMBER          ignore NUMBER remote directory components

HTTP options:

      --http-user=USER            set http user to USER

      --http-password=PASS        set http password to PASS

      --no-cache                  disallow server-cached data

      --default-page=NAME        change the default page name (normally

                                    this is 'index.html'.)

  -E,  --adjust-extension          save HTML/CSS documents with proper extensions

      --ignore-length            ignore 'Content-Length' header field

      --header=STRING            insert STRING among the headers

      --compression=TYPE          choose compression, one of auto, gzip and none. (default: none)

      --max-redirect              maximum redirections allowed per page

      --proxy-user=USER          set USER as proxy username

      --proxy-password=PASS      set PASS as proxy password

      --referer=URL              include 'Referer: URL' header in HTTP request

      --save-headers              save the HTTP headers to file

  -U,  --user-agent=AGENT          identify as AGENT instead of Wget/VERSION

      --no-http-keep-alive        disable HTTP keep-alive (persistent connections)

      --no-cookies                don't use cookies

      --load-cookies=FILE        load cookies from FILE before session

      --save-cookies=FILE        save cookies to FILE after session

      --keep-session-cookies      load and save session (non-permanent) cookies

      --post-data=STRING          use the POST method; send STRING as the data

      --post-file=FILE            use the POST method; send contents of FILE

      --method=HTTPMethod        use method "HTTPMethod" in the request

      --body-data=STRING          send STRING as data. --method MUST be set

      --body-file=FILE            send contents of FILE. --method MUST be set

      --content-disposition      honor the Content-Disposition header when

                                    choosing local file names (EXPERIMENTAL)

      --content-on-error          output the received content on server errors

      --auth-no-challenge        send Basic HTTP authentication information

                                    without first waiting for the server's

                                    challenge

HTTPS (SSL/TLS) options:

      --secure-protocol=PR        choose secure protocol, one of auto, SSLv2,

                                    SSLv3, TLSv1, TLSv1_1, TLSv1_2 and PFS

      --https-only                only follow secure HTTPS links

      --no-check-certificate      don't validate the server's certificate

      --certificate=FILE          client certificate file

      --certificate-type=TYPE    client certificate type, PEM or DER

      --private-key=FILE          private key file

      --private-key-type=TYPE    private key type, PEM or DER

      --ca-certificate=FILE      file with the bundle of CAs

      --ca-directory=DIR          directory where hash list of CAs is stored

      --crl-file=FILE            file with bundle of CRLs

      --pinnedpubkey=FILE/HASHES  Public key (PEM/DER) file, or any number

                                  of base64 encoded sha256 hashes preceded by

                                  'sha256//' and separated by ';', to verify

                                  peer against

      --random-file=FILE          file with random data for seeding the SSL PRNG

      --ciphers=STR          Set the priority string (GnuTLS) or cipher list string (OpenSSL) directly.

                                  Use with care. This option overrides --secure-protocol.

                                  The format and syntax of this string depend on the specific SSL/TLS engine.

HSTS options:

      --no-hsts                  disable HSTS

      --hsts-file                path of HSTS database (will override default)

FTP options:

      --ftp-user=USER            set ftp user to USER

      --ftp-password=PASS        set ftp password to PASS

      --no-remove-listing        don't remove '.listing' files

      --no-glob                  turn off FTP file name globbing

      --no-passive-ftp            disable the "passive" transfer mode

      --preserve-permissions      preserve remote file permissions

      --retr-symlinks            when recursing, get linked-to files (not dir)

FTPS options:

      --ftps-implicit                use implicit FTPS (default port is 990)

      --ftps-resume-ssl              resume the SSL/TLS session started in the control connection when

                                        opening a data connection

      --ftps-clear-data-connection    cipher the control channel only; all the data will be in plaintext

      --ftps-fallback-to-ftp          fall back to FTP if FTPS is not supported in the target server

WARC options:

      --warc-file=FILENAME        save request/response data to a .warc.gz file

      --warc-header=STRING        insert STRING into the warcinfo record

      --warc-max-size=NUMBER      set maximum size of WARC files to NUMBER

      --warc-cdx                  write CDX index files

      --warc-dedup=FILENAME      do not store records listed in this CDX file

      --no-warc-compression      do not compress WARC files with GZIP

      --no-warc-digests          do not calculate SHA1 digests

      --no-warc-keep-log          do not store the log file in a WARC record

      --warc-tempdir=DIRECTORY    location for temporary files created by the

                                    WARC writer

Recursive download:

  -r,  --recursive                specify recursive download

  -l,  --level=NUMBER              maximum recursion depth (inf or 0 for infinite)

      --delete-after              delete files locally after downloading them

  -k,  --convert-links            make links in downloaded HTML or CSS point to

                                    local files

      --convert-file-only        convert the file part of the URLs only (usually known as the basename)

      --backups=N                before writing file X, rotate up to N backup files

  -K,  --backup-converted          before converting file X, back up as X.orig

  -m,  --mirror                    shortcut for -N -r -l inf --no-remove-listing

  -p,  --page-requisites          get all images, etc. needed to display HTML page

      --strict-comments          turn on strict (SGML) handling of HTML comments

Recursive accept/reject:

  -A,  --accept=LIST              comma-separated list of accepted extensions

  -R,  --reject=LIST              comma-separated list of rejected extensions

      --accept-regex=REGEX        regex matching accepted URLs

      --reject-regex=REGEX        regex matching rejected URLs

      --regex-type=TYPE          regex type (posix|pcre)

  -D,  --domains=LIST              comma-separated list of accepted domains

      --exclude-domains=LIST      comma-separated list of rejected domains

      --follow-ftp                follow FTP links from HTML documents

      --follow-tags=LIST          comma-separated list of followed HTML tags

      --ignore-tags=LIST          comma-separated list of ignored HTML tags

  -H,  --span-hosts                go to foreign hosts when recursive

  -L,  --relative                  follow relative links only

  -I,  --include-directories=LIST  list of allowed directories

      --trust-server-names        use the name specified by the redirection

                                    URL's last component

  -X,  --exclude-directories=LIST  list of excluded directories

  -np, --no-parent                don't ascend to the parent directory

拓展

当我们在使用wget命令下载时,添加了参数-b 后,运行命令时,会返回下图:

pid代表了后台任务的进程,可以通过kill掉这个id来终止这次下载。wget-log.2文件记录了下载的输出信息(持续写入,直至完全下载完毕)。

wget查看后台任务进度

1. 通过断点续传查看: 运行 wget -c 正在执行的下载地址。

2. 通过产看wget-log文件产看: 运行tail -f wget-log 或cat wget-log 即可,终止查看时按ctrl + c。

https://www.jianshu.com/p/2e2ba8ecc22a

https://blog.csdn.net/wanglc7/article/details/85136418

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 用法: wget [选项]... [URL]... 长选项所必须的参数在使用短选项时也是必须的。 启动: 日志和输...
    夜观星阅读 1,162评论 0 0
  • GNU Wget 1.14,非交互式的网络文件下载工具。用法: wget [选项]... [URL]... 启动:...
    FantJ阅读 2,118评论 1 3
  • GNU Wget 1.20.1,非交互式的网络文件下载工具。用法: wget [选项]... [URL]... 长...
    佳名阅读 2,133评论 0 0
  • 用该命令前需要安装wget,mac的安装如下:命令行输入:brew install wgetbrew的安装链接:h...
    小圆圈Belen阅读 694评论 0 1
  • wget是一个下载文件的工具,它用在命令行下。对于Linux用户是必不可少的工具,我们经常要下载一些软件或从远程服...
    辣辣不乖阅读 1,140评论 0 1