我们在写爬虫脚本的时候经常要获取请求头,但是每次从浏览器粘贴到代码里时,都要费一番功夫来处理格式。
于是写了一个请求头转换的脚本,可以将浏览器里复制过来的请求头字符串转换为字典并输出。
import re
def headers_to_dict(headers_str, out_put=True):
items = headers_str.strip().split('\n')
headers_dict = {}
for t in items:
key, value = re.findall(r'^(\S+):\s*([\s\S]+)$', t)[0]
headers_dict[key] = value
if out_put:
print(f"'{key}': '{value}',")
return headers_dict
使用说明:
-
headers_str
从浏览器复制的请求头字符串,使用三个单引号 -
out_put
是否输出格式化的字符串,为 True时会将每个键值对以'key':'value',
的格式输出,可以直接粘贴到字典中。觉得换行字符串不好看的可以用这个将请求头输出,然后手动粘贴到代码中。 - 返回值,字符串对应的请求头字典。
使用示例:
headers_to_dict(''':authority: www.jianshu.com
:method: GET
:path: /p/b671f67a5960
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
cache-control: max-age=0
if-none-match: W/"0d1384f05bc47dfa8d8d26187e1b3f4f"
referer: https://www.jianshu.com/writer
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36''')
#输出
"""
':authority': 'www.jianshu.com',
':method': 'GET',
':path': '/p/b671f67a5960',
':scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'max-age=0',
'if-none-match': 'W/"0d1384f05bc47dfa8d8d26187e1b3f4f"',
'referer': 'https://www.jianshu.com/writer',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
"""
#返回值
"""
{
':authority': 'www.jianshu.com',
':method': 'GET',
':path': '/p/b671f67a5960',
':scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'max-age=0',
'if-none-match': 'W/"0d1384f05bc47dfa8d8d26187e1b3f4f"',
'referer': 'https://www.jianshu.com/writer',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}
"""