python requests请求含有中文时编码格式错误

日志操作爽歪歪

基本上每天早上到公司之后都会习惯性的看下线上的错误日志,如果错误日志有问题会根据日志是否影响业务的使用,进而决定是继续查看相关日志进行问题的定位或者解决。
其实每天重复机械的看也没啥,毕竟Chrome + Vimium+gVim+Intellij IDEA +花式快捷键全程手都不带离键盘的,好像也没有其他更好的方式,所以就这样坚持到了现在。及时前一个月的时候,小组中的小安姐姐用python写了一个查询线上error日志,然后自动发邮件通知给威哥的小工具之后,本帅依然是每天直接打开日志系统,照旧一个一个查看系统是否运行ok。只是前几天没事就用python实现了发邮件的功能,然后,功能就荒废在哪里了!(又没啥用。)代码查看
正好现在有点烦机械式的查看日志,感觉要是用程序把那些重复的日志进行去重,而且做一下简单汇总备案。感觉不是爽歪歪。遂开始日志爬取计划程序。

写代码

嗯,下面就是花了大概全程十分钟不到模拟了日志系统的请求方式:


Postman请求截图

然后生成如下代码:

def listErrorLogs():
    import requests

    url = "http://boss.we.com/logs/idc/elasticsearch/_msearch"

    payload = "{\"index\":[\"log-java-idc-2018.12.27\"],\"ignore_unavailable\":true," \
              "\"preference\":1545975388333}\r\n{\"query\":{\"bool\":{\"must\":[{\"query_string\":{" \
              "\"analyze_wildcard\":true,\"query\":\"*\"}},{\"match\":{\"proj\":{\"query\":\"t8t-scm-mdm\"," \
              "\"type\":\"phrase\"}}},{\"match\":{\"lv\":{\"query\":\"ERROR\",\"type\":\"phrase\"}}},{\"range\":{" \
              "\"@timestamp\":{\"gte\":1545840000000,\"lte\":1545926399999,\"format\":\"epoch_millis\"}}}]," \
              "\"must_not\":[{\"match\":{\"txt\":{\"query\":\"完成请求,200返回:\",\"type\":\"phrase\"}}}]}},\"size\":3000," \
              "\"sort\":[{\"@timestamp\":{\"order\":\"desc\",\"unmapped_type\":\"boolean\"}}],\"_source\":{" \
              "\"excludes\":[]},\"aggs\":{\"2\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"30m\"," \
              "\"time_zone\":\"Asia/Shanghai\",\"min_doc_count\":1}}},\"stored_fields\":[\"*\"],\"script_fields\":{}," \
              "\"docvalue_fields\":[\"@timestamp\"],\"highlight\":{\"pre_tags\":[\"@kibana-highlighted-field@\"]," \
              "\"post_tags\":[\"@/kibana-highlighted-field@\"],\"fields\":{\"*\":{\"highlight_query\":{\"bool\":{" \
              "\"must\":[{\"query_string\":{\"analyze_wildcard\":true,\"query\":\"*\",\"all_fields\":true}}," \
              "{\"match\":{\"proj\":{\"query\":\"t8t-scm-mdm\",\"type\":\"phrase\"}}},{\"match\":{\"lv\":{" \
              "\"query\":\"ERROR\",\"type\":\"phrase\"}}},{\"range\":{\"@timestamp\":{\"gte\":1545840000000," \
              "\"lte\":1545926399999,\"format\":\"epoch_millis\"}}}],\"must_not\":[{\"match\":{\"txt\":{" \
              "\"query\":\"完成请求,200返回:\",\"type\":\"phrase\"}}}]}}}},\"fragment_size\":2147483647}}\r\n"\
    headers = {
        'authorization': "Basic ZGV2OnRlc3Q=",
        'kbn-version': "5.3.0-SNAPSHOT",
        'cache-control': "no-cache",
        "content-encoding": "gzip",
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    print(response.text)

if __name__ == "__main__":
    listErrorLogs()

运行一下

坐等运行正确运行结果的本帅,居然有bug!!!
一看到这个结果,顿时心情就不好了,不科学啊!

 File "D:/git/pythonProject/to8to/logs/IdcLogs.py", line 15, in listErrorLogs
    response = requests.request("POST", url, data=payload, headers=headers)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1274, in _send_request
    body = _encode(body, 'body')
  File "C:\Users\SHUAI.PAN\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 160, in _encode
    (name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 413-417: Body ('完成请求,') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
def _encode(data, name='data'):
    """Call data.encode("latin-1") but show a better error message."""
    try:
        return data.encode("latin-1") ## 调用 latin-1 格式编码,真实人生处处有彩蛋----有木有搞错,赤裸裸的歧视啊!!!
##没事反正歧视的不止中文,想到这里瞬间觉得稍微好一点了。
    except UnicodeEncodeError as err:
        raise UnicodeEncodeError(
            err.encoding,
            err.object,
            err.start,
            err.end,
            "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
            "if you want to send it encoded in UTF-8." %
            (name.title(), data[err.start:err.end], name)) from None

本来想搜为啥开发requests包童鞋要这么挑战大家,不小心看到了python requests 发送中文参数的问题,用里面只针对中文编码是没有任何问题的,只是假如中文到处分散的话,到处都要编码,不如一劳永逸如下:

def listErrorLogs():
    import requests

    url = "http://boss.we.com/logs/idc/elasticsearch/_msearch"

    # index = '{"index": ["log-java-idc-2018.12.27"], "ignore_unavailable": true, "preference": 1545975388333}'
    # query = '{"query":{"bool":{"must":[{"query_string":{"analyze_wildcard":true,"query":"*"}},{"match":{"proj":{"query":"t8t-scm-mdm","type":"phrase"}}},{"match":{"lv":{"query":"ERROR","type":"phrase"}}},{"range":{"@timestamp":{"gte":1545840000000,"lte":1545926399999,"format":"epoch_millis"}}}],"must_not":[]}},"size":3000,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"30m","time_zone":"Asia/Shanghai","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{"highlight_query":{"bool":{"must":[{"query_string":{"analyze_wildcard":true,"query":"*","all_fields":true}},{"match":{"proj":{"query":"t8t-scm-mdm","type":"phrase"}}},{"match":{"lv":{"query":"ERROR","type":"phrase"}}},{"range":{"@timestamp":{"gte":1545840000000,"lte":1545926399999,"format":"epoch_millis"}}}],"must_not":[]}}}},"fragment_size":2147483647}}'
    payload = "{\"index\":[\"log-java-idc-2018.12.27\"],\"ignore_unavailable\":true," \
              "\"preference\":1545975388333}\r\n{\"query\":{\"bool\":{\"must\":[{\"query_string\":{" \
              "\"analyze_wildcard\":true,\"query\":\"*\"}},{\"match\":{\"proj\":{\"query\":\"t8t-scm-mdm\"," \
              "\"type\":\"phrase\"}}},{\"match\":{\"lv\":{\"query\":\"ERROR\",\"type\":\"phrase\"}}},{\"range\":{" \
              "\"@timestamp\":{\"gte\":1545840000000,\"lte\":1545926399999,\"format\":\"epoch_millis\"}}}]," \
              "\"must_not\":[{\"match\":{\"txt\":{\"query\":\"完成请求,200返回:\",\"type\":\"phrase\"}}}]}},\"size\":3000," \
              "\"sort\":[{\"@timestamp\":{\"order\":\"desc\",\"unmapped_type\":\"boolean\"}}],\"_source\":{" \
              "\"excludes\":[]},\"aggs\":{\"2\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"30m\"," \
              "\"time_zone\":\"Asia/Shanghai\",\"min_doc_count\":1}}},\"stored_fields\":[\"*\"],\"script_fields\":{}," \
              "\"docvalue_fields\":[\"@timestamp\"],\"highlight\":{\"pre_tags\":[\"@kibana-highlighted-field@\"]," \
              "\"post_tags\":[\"@/kibana-highlighted-field@\"],\"fields\":{\"*\":{\"highlight_query\":{\"bool\":{" \
              "\"must\":[{\"query_string\":{\"analyze_wildcard\":true,\"query\":\"*\",\"all_fields\":true}}," \
              "{\"match\":{\"proj\":{\"query\":\"t8t-scm-mdm\",\"type\":\"phrase\"}}},{\"match\":{\"lv\":{" \
              "\"query\":\"ERROR\",\"type\":\"phrase\"}}},{\"range\":{\"@timestamp\":{\"gte\":1545840000000," \
              "\"lte\":1545926399999,\"format\":\"epoch_millis\"}}}],\"must_not\":[{\"match\":{\"txt\":{" \
              "\"query\":\"完成请求,200返回:\",\"type\":\"phrase\"}}}]}}}},\"fragment_size\":2147483647}}\r\n"\
        .encode("UTF-8") ## 对字符串进行`UTF-8`编码格式编码
    headers = {
        'authorization': "Basic ZGV2OnRlc3Q=",
        'kbn-version': "5.3.0-SNAPSHOT",
        'cache-control': "no-cache",
        "content-encoding": "gzip",
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    print(response.text)

if __name__ == "__main__":
    listErrorLogs()

运行结果

看样子是爽歪歪的

# 编码之后请求的内容会进行编码变成如下格式
"完成请求,200返回:"=编码=》"\xe5\xae\x8c\xe6\x88\x90\xe8\xaf\xb7\xe6\xb1\x82\xef\xbc\x8c200\xe8\xbf\x94\xe5\x9b\x9e\xef\xbc\x9a"

## 编码之前
{"index":["log-java-idc-2018.12.27"],"ignore_unavailable":true,"preference":1545959585097}
{"query":{"bool":{"must":[{"query_string":{"query":"*","analyze_wildcard":true}},{"match":{"proj":{"query":"t8t-scm-mdm","type":"phrase"}}},{"match":{"lv":{"query":"ERROR","type":"phrase"}}},{"range":{"@timestamp":{"gte":1545840000000,"lte":1545926399999,"format":"epoch_millis"}}}],"must_not":[{"match":{"txt":{"query":"完成请求,200返回:","type":"phrase"}}}]}},"size":3000,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"30m","time_zone":"Asia/Shanghai","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{"highlight_query":{"bool":{"must":[{"query_string":{"query":"*","analyze_wildcard":true,"all_fields":true}},{"match":{"proj":{"query":"t8t-scm-mdm","type":"phrase"}}},{"match":{"lv":{"query":"ERROR","type":"phrase"}}},{"range":{"@timestamp":{"gte":1545840000000,"lte":1545926399999,"format":"epoch_millis"}}}],"must_not":[{"match":{"txt":{"query":"完成请求,200返回:","type":"phrase"}}}]}}}},"fragment_size":2147483647}}

## 编码之后
'{"index":["log-java-idc-2018.12.27"],"ignore_unavailable":true,"preference":1545975388333}\r\n{"query":{"bool":{"must":[{"query_string":{"analyze_wildcard":true,"query":"*"}},{"match":{"proj":{"query":"t8t-scm-mdm","type":"phrase"}}},{"match":{"lv":{"query":"ERROR","type":"phrase"}}},{"range":{"@timestamp":{"gte":1545840000000,"lte":1545926399999,"format":"epoch_millis"}}}],"must_not":[{"match":{"txt":{"query":"\xe5\xae\x8c\xe6\x88\x90\xe8\xaf\xb7\xe6\xb1\x82\xef\xbc\x8c200\xe8\xbf\x94\xe5\x9b\x9e\xef\xbc\x9a","type":"phrase"}}}]}},"size":3000,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"30m","time_zone":"Asia/Shanghai","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{"highlight_query":{"bool":{"must":[{"query_string":{"analyze_wildcard":true,"query":"*","all_fields":true}},{"match":{"proj":{"query":"t8t-scm-mdm","type":"phrase"}}},{"match":{"lv":{"query":"ERROR","type":"phrase"}}},{"range":{"@timestamp":{"gte":1545840000000,"lte":1545926399999,"format":"epoch_millis"}}}],"must_not":[{"match":{"txt":{"query":"\xe5\xae\x8c\xe6\x88\x90\xe8\xaf\xb7\xe6\xb1\x82\xef\xbc\x8c200\xe8\xbf\x94\xe5\x9b\x9e\xef\xbc\x9a","type":"phrase"}}}]}}}},"fragment_size":2147483647}}\r\n'
请求正常

总结

程序是自动生成的,算上新建.py文件,调整一下,运行以及到最后成功。整个花的时间不过20分钟,但却是一箭穿过了这一整天时间。
想想本帅的计划里面还有要把数据进行去重,分类。然后持久化数据,并生成可视化,人性化的分析结果。。。这又要穿过多少天呢?
鲁迅小哥哥说得到底是有点意思的“时间是海绵里的水,挤挤还是有的!”,只是现实生活总是如此匆忙,像这样简单的事情时间一挤就是一整天。可想而知鲁迅小哥哥在写出这名言的时候,也深感挤的无奈吧!!!

--- 致鲁迅小哥哥的名言&公司的日志系统

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 220,002评论 6 509
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,777评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 166,341评论 0 357
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,085评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,110评论 6 395
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,868评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,528评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,422评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,938评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,067评论 3 340
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,199评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,877评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,540评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,079评论 0 23
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,192评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,514评论 3 375
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,190评论 2 357

推荐阅读更多精彩内容

  • Python 资源大全中文版 awesome-python[https://github.com/vinta/aw...
    万色星辰阅读 9,771评论 0 255
  • Python 资源大全中文版 我想很多程序员应该记得 GitHub 上有一个 Awesome - XXX 系列的资...
    Clemente阅读 3,272评论 0 54
  • GitHub 上有一个 Awesome - XXX 系列的资源整理,资源非常丰富,涉及面非常广。awesome-p...
    若与阅读 18,660评论 4 418
  • 01 我们这种人,是哪种人? 就是那种明明说好了第二天要早起,可是还是要刷微博到深夜的人。 就是那种明明说好了要好...
    知墨阅读 459评论 6 12
  • “ 高中时代”,美好而烂漫,同时逗逼气息也很浓重啊! 老司机已经开路了,而像我这种老实人还在4s店...
    0712蓓阅读 355评论 0 0