左手MongoDB(MongoDB的高级语法)

一、AND和OR操作

数据集

数据类型

1、查询同时符合两个条件的人(AND操作)

隐式AND操作

查询所有age大于20并且sex为“男”的数据

db.getCollection('example_data_1').find({'age':{'$gt':20},'sex':'男'})

显式AND操作

显式AND操作的语法为

db.getCollection('example_data_1').find({'$and':[字典1,字典2,字典3,...,字典n]})

查询所有age大于20并且sex为“男”的数据

db.getCollection('example_data_1').find({'$and':[{'age':{'$gt':20},'sex':'男'}]})

image.png

显式AND操作和隐式AND操作混用

查询所有age大于20并且sex为“男”的数据，并且id小于10的数据

db.getCollection('example_data_1').find({'id':{'$lt':10},'$and':[{'age':{'$gt':20},'sex':'男'}]})

2、查询只符合其中任条件的人 (OR操作)

显式OR操作举例

显式OR的语法

db.getCollection('example_data_1').find({'$or':[字典1,字典2,字典3,...,字典n]})

age大于28，或者salary大于9900

db.getCollection('example_data_1').find({'$or':[{'age':{'$gt':28}},{'salary':{'$gt':9900}}]})

不能写成隐式AND的操作

一个AND操作内部包含多个OR操作

age大于28，或者salary大于9900
sex为“男”，或者id小于20

db.getCollection('example_data_1').find({
  '$and':[
    {'$or':[{'age':{'$gt':28}},{'salary':{'$gt':9900}}]},
    {'$or':[{'sex':'男'},{'id':{'$lt':20}}]}
  ]
})

3、用Pyhon 实现MongoDB的AND与OR操作

age大于28的男性。
age大于28并且id小于20的女性。
salary大于9900的男性。
salary大于9900且id小于20的女性。

import pymongo
handler = pymongo.MongoClient().chapter_7.example_data_1

rows = handler.find({
  '$and':[
    {'$or':[{'age':{'$gt':28}},
            {'salary':{'$gt':9900}}]},
    {'$or':[{'sex':'男'},
            {'id':{'$lt':20}}]}
  ]
})

for row in rows :
    print(row)

二、查询子文档或数组中的数据 :

1、认识嵌入式文档

在这个数据集中，“user”称为嵌入式文档（Embedded Document），“user”下面的字段称为嵌套字段（Nested Field）

2、嵌入式文档的应用

使用点号定位嵌套字段

查询user_id为102的数据。

db.getCollection('example_data_2').find({'user.user_id':102})

查询所有“followed”大于10的数据的语句

db.getCollection('example_data_2').find({'user.followed':{'$gt':10}})

返回嵌套字段中的特定内容

只返回“name”和“user_id”这两个字段

db.getCollection('example_data_2').find({'user.followed':{'$gt':10}},
{'_id':0,'user.name':1,'user.user_id':1})

3、认识数组字段

Python的列表被写入MongoDB中就会变成数组（Array）

import random
from pymongo import MongoClient

client = MongoClient().chapter_7.example_post2
name_list = ['衬衣', '裤子', '鞋子', '帽子']
size_list = ['S', 'M', 'L', 'XL']
price_list = [100, 200, 300, 600, 800]

for i in range(10):
    random_ = random.randint(2, 4)
    client.insert_one({
        'name': random.choice(name_list),
        'size': random.sample(size_list, random_),
        'price': random.sample(price_list, random_)
    })

4、数组应用----查询数组包含与不包含 “XX"的数据

查询数组包含与不包含的数据

查询出所有“size”包含“M”的数据

db.getCollection('example_post2').find({'size':'M'})

查询出所有“size”不包含“M”的数据

db.getCollection('example_post2').find({'size':{'$ne':'M'}})

数组中至少有一个元素在另一个

price数组中至少有一个数据在800（含）~1000之间

db.getCollection('example_post2').find({'price':{'$lt':300,'$gte':200}})

5、数组应用----根据数组长度查询数据

查询所有“price”字段长度为2的记录

db.getCollection('example_post2').find({'price':{'$size':2}})

6、数组应用----根据索引查询数据

根据数组索引查询数据

查询所有“size”的第一个数据为“S”的记录

db.getCollection('example_post2').find({'size.0':'S'})

根据数组索引比较数据的大小

查询“price”第一个数据大于500的所有记录

db.getCollection('example_post2').find({'price.0':{'$gt':500}})

7、Python 操作嵌入式文档与数组字段

查询所有size包含M的记录。
查询price至少有一个元素在200~300范围中的记录。
查询price有两个元素的记录。
查询price索引为0的元素大于500的所有记录。

import pymongo

handler = pymongo.MongoClient().chapter_7.example_data_3

rows_1 = handler.find({'size.0':'M'})
rows_2 = handler.find({'price':{'$lt':300,'$gte':200}})
rows_3 = handler.find({'price':{'$size':2}})
rows_4 = handler.find({'price.0':{'$gt':500}})

三、MongoDB的聚合查询

1、聚合的基本语法

聚合操作的命令为“aggregate”，基本格式为：

collection.aggregate([阶段1,阶段2,阶段3,......,阶段N])

聚合操作可以有0个、1个或者多个阶段。
如果有0个阶段，则查询命令写为：

db.getCollection('example_data_1').aggregate([])

作用和collection.fine({})一样
如果聚合有至少一个阶段，那么每个阶段都是一个字典

负责筛选数据的“$match”
负责字段的“$project”
负责数据分组的“$group”
等

2、筛选数据

数据筛选的关键字为“$match”

collection.aggregate([{'$match':{和fine完全一样的查询表达式}}])

查询“age”大于等于27，且“sex”为“女”的所有记录

db.getCollection('example_data_1').aggregate([{'$match':{'age':{'$gte':27},'sex':'女'}}])

3、筛选与修改字段

返回部分字段

collection.aggregate([{'$project':{字段过滤语句}}])

不返回“_id”字段，只返回“age”和“sex”字段

db.getCollection('example_data_1').aggregate([{'$project':{'_id':0,'sex':1,'age':1}}])

结合“$match”实现“先筛选记录，再过滤字段”。
选择所有“age”大于28的记录，只返回“age”和“sex”字段

db.getCollection('example_data_1').aggregate([
{'$match':{'age':{'$gte':27},'sex':'女'}},
{'$project':{'_id':0,'sex':1,'age':1}}])

添加新字段

添加固定文本

db.getCollection('example_data_1').aggregate([
{'$match':{'age':{'$gte':27},'sex':'女'}},
{'$project':{'_id':0,'sex':1,'age':1,'hello':'world'}}])

复制现有字段

db.getCollection('example_data_1').aggregate([
{'$match':{'age':{'$gte':27},'sex':'女'}},
{'$project':{'_id':0,'sex':1,'age':1,'hello':'$age'}}])

修改现有字段的数据

db.getCollection('example_data_1').aggregate([
{'$match':{'age':{'$gte':27},'sex':'女'}},
{'$project':{'_id':0,'sex':1,'age':'this is age'}}])

抽取嵌套字段

使用find()

db.getCollection('example_data_2').find({},{'user.name':1,'user.user_id':1})

使用$project

db.getCollection('example_data_2').aggregate([
{'$project':{'name':'$user.name','user_id':'$user.user_id'}}])

处理字段特殊值

“hello”字段和“abcd”字段都没有添加成功

db.getCollection('example_data_1').aggregate([
{'$match':{'age':{'$gte':28}}},
{'$project':{'_id':0,'sex':1,'hello':'$normalstring','abcd':1}}])

使用“ $literal”解决，以“$ ”开头的普通字符串和数字都不能添加的问题

db.getCollection('example_data_1').aggregate([
{'$match':{'age':{'$gte':28}}},
{'$project':{'_id':0,'sex':1,'hello':{'$literal':'$normalstring'},'abcd':{'$literal':1}}}])

4、分组操作

在分组操作阶段去重

collection.aggregate([{'$group':{'_id':'$被去重的字段名'}}])

对“name”字段去重

db.getCollection('example_post3').aggregate([{'$group':{'_id':'$姓名'}}])

分组操作去重返回的是记录，“distinct”函数返回的是数组

分组并计算统计值

collection.aggregate([{'$group':{'_id':'$被去重的字段名',
'max_score':{'$max':'$字段名'},
'min_score':{'$min':'$字段名'},
'avgerage_score':{'$avg':'$字段名'},
'sum_score':{'$sum':'$字段名'}
}}])

计算每个人得分的最大值、最小值、得分之和、平均分

db.getCollection('example_post3').aggregate([{'$group':{'_id':'$姓名',
'max_score':{'$max':'$分数'},
'min_score':{'$min':'$分数'},
'avgerage_score':{'$avg':'$分数'},
'sum_score':{'$sum':'$分数'}
}}])

“$sum”的值还可以使用数字“1”，变成统计每一分组内有多少条记录

db.getCollection('example_post3').aggregate([{'$group':{'_id':'$姓名',
'max_score':{'$max':'$分数'},
'min_score':{'$min':'$分数'},
'avgerage_score':{'$avg':'$分数'},
'sum_score':{'$sum':'$分数'},
'doc_count':{'$sum':1}
}}])

去重并选择最新或最老的数据

以name为基准去重，然后取各个字段的最新数据

db.getCollection('example_post3').aggregate([{'$group':{'_id':'$姓名',
'日期':{'$last':'$日期'},
'分数':{'$last':'$分数'}
}}])

以name为基准去重，然后取各个字段的最老数据

db.getCollection('example_post3').aggregate([{'$group':{'_id':'$姓名',
'日期':{'$first':'$日期'},
'分数':{'$first':'$分数'}
}}])

5、拆分数组

collection.aggregate([{'$unwind':'$字段名'}])

把“size”（数组）拆开

db.getCollection('example_post2').aggregate([{'$unwind':'$size'}])

把“size”和“price”都拆开

db.getCollection('example_post2').aggregate([
{'$unwind':'$size'},
{'$unwind':'$price'}
])

6、联集合查询

example_user

example_post

同时查询多个集合

主集合.aggregate([{
'$lookup':{
'from':'被查集合名',
'localField':'主集合的字段',
'foreignField':'被查集合的字段',
'as':'保存查询结果的字段名'
}
}])

同时知道微博内容和发微博的用户名字与职业

db.getCollection('example_post').aggregate([
    {'$lookup': {
        'from': 'example_user',
        'localField': 'user_id',
        'foreignField': 'id',
        'as': 'user_info'
        }
    }
])

美化输出结果

将用户数组展开

db.getCollection('example_post').aggregate([
    {'$lookup': {
        'from': 'example_user',
        'localField': 'user_id',
        'foreignField': 'id',
        'as': 'user_info'
        }},
{'$unwind':'$user_info'}
])

提取出“name”字段和“work”字段

db.getCollection('example_post').aggregate([
    {'$lookup': {
        'from': 'example_user',
        'localField': 'user_id',
        'foreignField': 'id',
        'as': 'user_info'
        }},
{'$unwind':'$user_info'},
{'$project':{
    'content':1,
    'post_time':1,
    'name':'$user_info.name',
    'work':'$user_info.work'
}}
])

以用户集合为准查询微博集合

查询每个用户发微博情况

db.getCollection('example_user').aggregate([
    {'$lookup': {
        'from': 'example_post',
        'localField': 'id',
        'foreignField': 'user_id',
        'as': 'weibo_info'
        }
    }
])

美化结果

db.getCollection('example_user').aggregate([
    {'$lookup': {
        'from': 'example_post',
        'localField': 'id',
        'foreignField': 'user_id',
        'as': 'weibo_info'
        }
    },
    {'$unwind': '$weibo_info'},
    {'$project': {
        'name': 1,
        'work': 1,
        'content': '$weibo_info.content',
        'post_time': '$weibo_info.post_time'}}
])

聚合操作阶段的组合方式

建议把“$match”放在最前面，充分利用MongoDB的索引，提高查询效率。

db.getCollection('example_user').aggregate([
    {'$match':{'name':'张小二'}},
    {'$lookup': {
        'from': 'example_post',
        'localField': 'id',
        'foreignField': 'user_id',
        'as': 'weibo_info'
        }
    },
    {'$unwind': '$weibo_info'},
    {'$project': {
        'name': 1,
        'work': 1,
        'content': '$weibo_info.content',
        'post_time': '$weibo_info.post_time'}}
])

7、使用Python执行聚合操作

聚合操作涉及的代码99%的，都可以复制粘贴过来

import pymongo

handler = pymongo.MongoClient().chapter_7.example_user

rows = handler.aggregate([
    {'$lookup': {
        'from': 'example_post',
        'localField': 'id',
        'foreignField': 'user_id',
        'as': 'weibo_info'
        }
    },
    {'$unwind': '$weibo_info'},
    {'$project': {
        'name': 1,
        'work': 1,
        'content': '$weibo_info.content',
        'post_time': '$weibo_info.post_time'}}
])
for row in rows:
    print(row)