0、楔子

1）什么是数据提取？

简单的来说，数据提取就是从响应中获取我们想要的数据的过程

2)数据分类

非结构化的数据：html,文本等
处理方法：正则表达式、xpath、beautiful soup
结构化数据：json，xml等

3）什么是JSON？

JSON(JavaScript Object Notation) 是一种轻量级的数据交换格式，它使得人们很容易的进行阅读和编写。同时也方便了机器进行解析和生成。

适用于进行数据交互的场景，比如网站前台与后台之间的数据交互。

4）如何找到返回json的url呢？

使用浏览器/抓包工具进行分析 wireshark(windows/linux),tcpdump(linux)
抓包手机app的软件

图1-1 json--python转换

1、json.dumps()

json.dumps()用于将dict类型的数据转成str，因为如果直接将dict类型的数据写入json文件中会发生报错，因此在将数据写入时需要用到该函数。

先来看一段代码：

import json

# 学生信息
myinfo_dict = {"name":"李易阳", "age":23, "sex":"男"}

# 类型
print(type(myinfo_dict))

# 转换成json字符串数据
json_obj_str = json.dumps(myinfo_dict)

# 类型
print(type(json_obj_str))

输出结果如下：

<class 'dict'>
<class 'str'>

再看官方文档参数说明如下：

def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, 
allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw):

    Serialize ``obj`` to a JSON formatted ``str``.

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
    instead of raising a ``TypeError``.

    If ``ensure_ascii`` is false, then the return value can contain non-ASCII
    characters if they appear in strings contained in ``obj``. Otherwise, all
    such characters are escaped in JSON strings.

    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``OverflowError`` (or worse).

    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
    strict compliance of the JSON specification, instead of using the
    JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).

    If ``indent`` is a non-negative integer, then JSON array elements and
    object members will be pretty-printed with that indent level. An indent
    level of 0 will only insert newlines. ``None`` is the most compact
    representation.

    If specified, ``separators`` should be an ``(item_separator, key_separator)``
    tuple.  The default is ``(', ', ': ')`` if *indent* is ``None`` and
    ``(',', ': ')`` otherwise.  To get the most compact JSON representation,
    you should specify ``(',', ':')`` to eliminate whitespace.

    ``default(obj)`` is a function that should return a serializable version
    of obj or raise TypeError. The default simply raises TypeError.

    If *sort_keys* is true (default: ``False``), then the output of
    dictionaries will be sorted by key.

    To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
    ``.default()`` method to serialize additional types), specify it with
    the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.

2、json.loads()

json.loads()用于将str类型的数据转成dict。

import json  
   
name_emb = {'a':'1111','b':'2222','c':'3333','d':'4444'}   
  
jsDumps = json.dumps(name_emb)      
 
# jsDumps是json字符串格式 
jsLoads = json.loads(jsDumps)   
  
print(name_emb)  
print(jsDumps)  
print(jsLoads)  
  
print(type(name_emb))  
print(type(jsDumps))  
print(type(jsLoads))

官方文档参数说明如下：

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

      Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.

    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).

    ``object_pairs_hook`` is an optional function that will be called with the
    result of any object literal decoded with an ordered list of pairs.  The
    return value of ``object_pairs_hook`` will be used instead of the ``dict``.
    This feature can be used to implement custom decoders.  If ``object_hook``
    is also defined, the ``object_pairs_hook`` takes priority.

    ``parse_float``, if specified, will be called with the string
    of every JSON float to be decoded. By default this is equivalent to
    float(num_str). This can be used to use another datatype or parser
    for JSON floats (e.g. decimal.Decimal).

    ``parse_int``, if specified, will be called with the string
    of every JSON int to be decoded. By default this is equivalent to
    int(num_str). This can be used to use another datatype or parser
    for JSON integers (e.g. float).

    ``parse_constant``, if specified, will be called with one of the
    following strings: -Infinity, Infinity, NaN.
    This can be used to raise an exception if invalid JSON numbers
    are encountered.

    To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
    kwarg; otherwise ``JSONDecoder`` is used.

    The ``encoding`` argument is ignored and deprecated.

3、json.dump()

json.dump()用于将dict类型的数据转成str，并写入到json文件中。下面两种方法都可以将数据写入json文件.

import json    
    
name_emb = {'a':'1111','b':'2222','c':'3333','d':'4444'}    
            
emb_filename = ('/home/cqh/faceData/emb_json.json')    
        
json.dump(name_emb, open(emb_filename, "w"))

4、json.load()

json.load()用于从json文件中读取数据。

import json    
  
emb_filename = ('/home/cqh/faceData/emb_json.json')    
  
jsObj = json.load(open(emb_filename))      
  
print(jsObj)  
print(type(jsObj))  
  
for key in jsObj.keys():  
    print('key: %s   value: %s' % (key,jsObj.get(key)))

运行结果如下：

{u'a': u'1111', u'c': u'3333', u'b': u'2222', u'd': u'4444'}
<type 'dict'>
key: a value: 1111
key: c value: 3333
key: b value: 2222
key: d value: 4444

总结：

json.dumps : dict转成str ，一个是将字典转换为字符串
json.loads: str转成dict ，一个是将字符串转换为字典
json.dump 是将python数据保存成json文件
json.load 是读取json数据（文件）

具有read()或者write()方法的对象就是类文件对象
f = open(“a.txt”,”r”)，其中f就是类文件对象

@墨雨出品 必属精品 如有雷同 纯属巧合
`非学无以广才，非志无以成学！`

05-[dumps,loads]和[dump,load]区别