一,聚合操作
将数据经过计算,映射,汇总等操作后返回最终处理数据。
mongo有三种方式进行聚合操作:
1,Aggregation Pipeline
2,Map-Reduce
3,Single Purpose Aggregation Operations
先下载一份测试数据,https://gitee.com/geektime-geekbang/geektime-mongodb-course/raw/master/aggregation/dump.tar.gz
解压后执行mongorestore(新版mongo需要自行下载mongo tools),可以得到一个mock数据
看下order数据结构
db.orders.findOne({})
{
"_id" : ObjectId("5dbe7a545368f69de2b4d36e"),
"street" : "493 Hilll Curve",
"city" : "Champlinberg",
"state" : "Texas",
"country" : "Malaysia",
"zip" : "24344-1715",
"phone" : "425.956.7743 x4621",
"name" : "Destinee Schneider",
"userId" : 3573,
"orderDate" : ISODate("2019-03-26T03:20:08.805Z"),
"status" : "created",
"shippingFee" : NumberDecimal("8.00"),
"orderLines" : [
{
"product" : "Refined Fresh Tuna",
"sku" : "2057",
"qty" : 25,
"price" : NumberDecimal("56.00"),
"cost" : NumberDecimal("46.48")
},
{
"product" : "Refined Concrete Ball",
"sku" : "1738",
"qty" : 61,
"price" : NumberDecimal("47.00"),
"cost" : NumberDecimal("47")
},
{
"product" : "Rustic Granite Towels",
"sku" : "500",
"qty" : 62,
"price" : NumberDecimal("74.00"),
"cost" : NumberDecimal("62.16")
},
{
"product" : "Refined Rubber Salad",
"sku" : "1400",
"qty" : 73,
"price" : NumberDecimal("93.00"),
"cost" : NumberDecimal("87.42")
},
{
"product" : "Intelligent Wooden Towels",
"sku" : "5674",
"qty" : 72,
"price" : NumberDecimal("84.00"),
"cost" : NumberDecimal("68.88")
},
{
"product" : "Refined Steel Bacon",
"sku" : "5009",
"qty" : 8,
"price" : NumberDecimal("53.00"),
"cost" : NumberDecimal("50.35")
}
],
"total" : NumberDecimal("407")
}
二,Aggregation Pipeline
db.collection.aggregate( [ { <stage> }, ... ] )
aggregate需要传入stage的数组作为管道,数据从上一个stage流出,经过当前stage处理,然后传递给下一个stage进行处理,当所有stage处理完之后得到最终数据。
2.1$match
操作
通过标准的mongo查询语句过滤文档流,例如:
//id匹配
db.orders.aggregate([{"$match":{"_id":ObjectId("5dbe7a545368f69de2b4d36e")}}])
//total>0的
db.orders.aggregate([{"$match":{"total":{"$gt":NumberDecimal("0")}}}])
//要么userId=3573,要么country为United States of America
db.orders.aggregate(
{"$match":{"$or":[
{"userId":3573},
{"country":"United States of America"}
]}}
)
2.2$project
操作
投影,用来决定文档字段是否要返回
//只返回userId和country
db.orders.aggregate(
[
{"$match":{"$or":[
{"userId":3573},
{"country":"United States of America"}
]}},
{"$project":{"userId":1,"country":1,"_id":0}}
]
)
2.3$group
操作
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
group操作会根据_id字段的表达式来进行分组
//根据userId汇总,查看每个用户产生的订单数
db.orders.aggregate(
[{
"$group":{
"_id":"$userId",
"count":{$sum:1}
}
}]
)
2.4$limit, $sort,$skip
db.orders.aggregate(
[
{
"$group":{
"_id":"$userId",
"count":{$sum:1}
}
},
{"$sort":{"count":1}},
{"$skip":15},
{"$limit":10}
]
)
2.5$unwind
用于展开文档的数组,path的值代表要展开文档的哪个字段,includeArrayIndex的值代表处理之后的文档用includeArrayIndex的值作为字段保存数组位置,preserveNullAndEmptyArrays为true时,空数组不会填充数组字段
{
$unwind:
{
path: <field path>,
includeArrayIndex: <string>,
preserveNullAndEmptyArrays: <boolean>
}
}
汇总_id为5dbe7a545368f69de2b4d36e的订单总花费
db.orders.aggregate(
[
{"$match":{"_id":ObjectId("5dbe7a545368f69de2b4d36e")}},
{"$unwind":{"path":"$orderLines"}},
{"$group":{"_id":"_id","totalCost":{"$sum":"$orderLines.cost"},"userId":{"$first":"$userId"}}}
]
)
2.6$lookup
左外连接
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
//往users表插入一条数据
db.users.insertOne({"userId":3573,"userName":"alex"})
//查出这个订单的总花费及user
db.orders.aggregate(
[
{"$match":{"_id":ObjectId("5dbe7a545368f69de2b4d36e")}},
{"$unwind":{"path":"$orderLines"}},
{"$group":{"_id":"_id","totalCost":{"$sum":"$orderLines.cost"},"userId":{"$first":"$userId"}}},
{"$lookup":{
"from":"users",
"localField":"userId",
"foreignField":"userId",
"as":"user"
}}
]
)
2.6$bucket
分桶
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
//将3月,4月的订单分成2个通,并统计月份订单数
db.orders.aggregate(
[
{"$bucket":{
"groupBy":"$orderDate",
"boundaries":[ISODate("2019-03-01T00:00:00.000Z"),ISODate("2019-04-01T00:00:00.000Z"),
ISODate("2019-05-01T00:00:00.000Z")],
"default":"other",
"output":{
"count":{"$sum":1}
}
}}
]
)
//返回
{ "_id" : "other", "count" : 79716 }
{ "_id" : ISODate("2019-03-01T00:00:00Z"), "count" : 10201 }
{ "_id" : ISODate("2019-04-01T00:00:00Z"), "count" : 10083 }
2.7 $facet
使用$facet
,可以在一个聚合中使用多组聚合
{ $facet:
{
<outputField1>: [ <stage1>, <stage2>, ... ],
<outputField2>: [ <stage1>, <stage2>, ... ],
...
}
}
//userCount字段里面包含所有userId的订单数,orderCount计算订单总数
db.orders.aggregate(
[{
"$facet":{
"userCount":[{"$group":{
"_id":"$userId",
"count":{"$sum":1}
}}],
"orderCount":[{"$group":{
"_id":null,
"count":{"$sum":1}
}}]
}
}]
)
三,compass
compass是mongo官方出的mongodb管理工具,提供了可视化界面,可以直观的查看聚合每个stage的输出结果
还可以直接分析查询语句