一、HTTP基本原理
1.URI和URL
URI:统一资源标志符
URL:统一资源定位符
注:URL是URI的子集。
2.HTTP和HTTPS
HTTP:超文本传输协议。
HTTPS:HTTP加入SSL层,传输内容通过SSL加密。
二、requests库
1.requests是基于http的高层库,它有以下两个主要功能:
1.request处理客户端的请求
2.response处理服务端的响应
2.获取响应信息
import requests
# 获取⽹⻚内容
response = requests.get('http://www.baidu.com/')
html = response.text
print(html)
# 取响应状态码和头信息
print(response.status)
print(response.getheaders())
print(response.getheader("Server"))
3.设置请求头
import requests
def get_page():
url = 'http://www.baidu.com/'
headers = {
"User-Agent": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)"
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
return None
4.下载图片
# 获取二进制资源
import requests
def get_resource(url):
headers = {
"User-Agent": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)"
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.content
return None
# 保存图片
def save_pic(url):
img_content = get_resource(url)
file_name = url.split('/')[-1].split('@')[0]
with open('./images/%s' % file_name, 'wb') as f:
f.write(img_content)