- 学习get与post请求,尝试使用requests或者是urllib用get方法向https://www.baidu.com/发出一个请求,并将其返回结果输出。
- 如果是断开了网络,再发出申请,结果又是什么。了解申请返回的状态码。
- 了解什么是请求头,如何添加请求头。
(学习博客地址:https://desmonday.github.io/2019/02/28/python%E7%88%AC%E8%99%AB%E5%AD%A6%E4%B9%A0-day1/#more)
1.GET与POST请求
1.1使用requests实现HTTP请求(推荐)
使用前需先安装第三方requests库。
get和post两个函数的参数含义如下:
in:
help(requests.get)
help(requests.post)
out:
get(url, params=None, **kwargs)
Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
post(url, data=None, json=None, **kwargs)
Sends a POST request.
:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
1.1.1一个请求与响应模型实现:
import requests
# get请求
r = requests.get('https://www.baidu.com')
print(r.content)
# post请求
postdata = {'key':'value'}
r = requests.post('https://www.baidu.com/login',data=postdata)
print(r.content)
输出:1.1.2响应与编码
import requests
# get请求为例
r = requests.get('https://www.baidu.com')
print('content:')
print(r.content)
print('text:')
print(r.text)
print('encoding:')
print(r.encoding)
r.encoding = 'utf-8'
print('new text:')
print(r.text)
输出:1.2使用urllib实现HTTP请求
Help on module urllib.request in urllib:
NAME
urllib.request - An extensible library for opening URLs using a variety of protocols
DESCRIPTION
The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below). It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do
all the actual work. Each Handler implements a particular protocol or
option. The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns. The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
urlopen(url, data=None) -- Basic usage is the same as original
urllib. pass the url and optionally data to post to an HTTP URL, and
get a file-like object back. One difference is that you can also pass
a Request instance instead of URL. Raises a URLError (subclass of
OSError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.
build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers. Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate. If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.
install_opener -- Installs a new opener as the default opener.
objects of interest:
OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.
Request -- An object that encapsulates the state of a request. The
state can be as simple as the URL. It can also include extra HTTP
headers, e.g. a User-Agent.
BaseHandler --
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='geheim$parole')
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
1.2.1一个请求与响应模型实现
import urllib.request
# get请求
f = urllib.request.urlopen('https://www.baidu.com')
firstline = f.readline() # 读取html页面的第一行
print(firstline)
# post请求
req = urllib.request.Request(url='https://www.baidu.com',
data=b'The first day of Web Crawler')
req_data = urllib.request.urlopen(req)
req = req_data.read()
print(req)
输出:2.申请返回的状态码
2.1HTTP请求返回状态码详解
2.2网络断开时发出申请的结果
使用requests请求为例。
import requests
# get请求
r = requests.get('https://www.baidu.com')
print(r.content)
# post请求
postdata = {'key':'value'}
r = requests.post('https://www.baidu.com/login',data=postdata)
print(r.content)
输出:3.请求头
3.1什么是请求头
请求头的作用,通俗来讲,就是能够告诉被请求的服务器需要传送什么样的格式的信息。
(参考博客:常用的请求头与响应头)
3.2如何添加请求头
在爬虫的时候,如果不添加请求头,可能网站会阻止一个用户的登陆,此时我们就需要添加请求头来进行模拟伪装,使用python添加请求头方法如下(使用requests请求为例)。
import requests
# 请求头
headers={"User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.6) ",
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language" : "en-us",
"Connection" : "keep-alive",
"Accept-Charset" : "GB2312,utf-8;q=0.7,*;q=0.7"}
r=requests.post("http://baike.baidu.com/item/哆啦A梦",headers=headers,allow_redirects=False) # allow_redirects设置为重定向
r.encoding='UTF-8'
print(r.url)
print(r.headers) #响应头
print(r.request.headers) #请求头
输出: