1.hadoop
MapReduce工作原理
https://blog.csdn.net/tanggao1314/article/details/51275812
MapReduce运行原理详解
http://blog.csdn.net/u011007180/article/details/52434382
MapReduce详解
https://blog.csdn.net/qq_24309787/article/details/82970116
MapReduce 框架 Yarn 详解
https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/
MapReduce框架的架构
http://dongxicheng.org/mapreduce-nextgen/nextgen-mapreduce-introduction/
NameNode 高可用 (High Availability) 实现解析
https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-name-node/
HDFS精华文章汇总
https://blog.csdn.net/androidlushangderen/article/details/78700392
理解Hadoop YARN架构
https://blog.csdn.net/bingduanlbd/article/details/51880019
YARN架构设计详解
https://www.cnblogs.com/wcwen1990/p/6737985.html
2.zk
分布式服务框架 Zookeeper
https://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/index.html
Zookeeper内部分析
https://blog.csdn.net/tang06211015/article/details/51921428
Zookeeper的功能以及工作原理
https://www.cnblogs.com/felixzh/p/5869212.html
[zookeeper选举机制]*
https://www.cnblogs.com/shuaiandjun/p/9383655.html
3.hive
hive 源码解析之代码整体结构
https://www.xuebuyuan.com/2181081.html
Hive原理及查询优化
https://blog.csdn.net/lw_ghy/article/details/51469753
Hive性能优化上的一些总结
https://blog.csdn.net/mrlevo520/article/details/76339075
4.hbase
HBase 官方文档中文版
http://abloz.com/hbase/book.html
深入理解HBase的系统架构
https://blog.csdn.net/Yaokai_AssultMaster/article/details/72877127
HBase 常用Shell命令
https://www.cnblogs.com/nexiyi/p/hbase_shell.html
Hbase开发实例
https://www.cnblogs.com/fangdai/p/5991620.html
- 定位一条记录所属region
- 查看一个region的数据量
-
查看一个Cell的所有版本
https://blog.csdn.net/javajxz008/article/details/51913533
HBase Rowkey设计
https://blog.csdn.net/u014091123/article/details/73163088
HBase写入的各种方式总结汇总
https://blog.csdn.net/shudaqi2010/article/details/88653796
HBase为什么不建议设置过多的列簇?
https://blog.csdn.net/weixin_43888806/article/details/100127854
5.spark
DataFrame操作
https://www.cnblogs.com/nucdy/p/6541564.html
RDD操作详解
https://blog.csdn.net/zhaojw_420/article/details/53261965
SparkSQL操作Hive Table
https://blog.csdn.net/zhao897426182/article/details/78435234/
park/spark-sql处理schema数据
https://www.cnblogs.com/kangoroo/p/6891540.html
Spark数据倾斜治理
http://www.jasongj.com/spark/skew/
Dataset中Actions、function、transformations
https://blog.csdn.net/legotime/article/details/52562796
Dataset中structField、structType、schame
https://blog.csdn.net/legotime/article/details/52643243
Spark 共享变量:广播变量、累加器
https://blog.csdn.net/wangpei1949/article/details/83335273
DataFrame新增一列的四种方法
https://www.cnblogs.com/itboys/p/9762808.html
Spark SQL将数据写入Mysql表的一些坑
https://blog.csdn.net/dai451954706/article/details/52840011/
Spark 动态资源分配(Dynamic Resource Allocation) 解析
http://www.imooc.com/article/267186
6.flink
Flink架构、原理与部署测试
https://blog.csdn.net/jdoouddm7i/article/details/62039337
广播流 Broadcast State用例
https://cloud.tencent.com/developer/article/1378332
Flink中的状态管理
云栖社区: https://yq.aliyun.com/articles/225623#
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html
WaterMark 水位
简书:https://www.jianshu.com/p/9db56f81fa2a
Flink 中 timeWindow 滚动窗口边界和数据延迟问题调研
https://blog.csdn.net/xsdxs/article/details/82415450
Window Join
https://blog.csdn.net/xsdxs/article/details/82750254
flink-SQL解析JSON格式数据
http://www.mamicode.com/info-detail-2644620.html
Flink UDF实例
https://www.jianshu.com/p/5dc2cab91c78
source function
https://cloud.tencent.com/developer/article/1366981
Flink 网络流控和反压剖析
https://yq.aliyun.com/articles/725982/
7.杂项
KUDU 介绍
https://www.jianshu.com/p/93c602b637a4
Azkaban介绍
https://blog.csdn.net/clypm/article/details/79076801
elasticsearch 常见查询及聚合的JAVA API
https://blog.csdn.net/majun_guang/article/details/81103623
Elasticsearch索引机制
https://bbs.huaweicloud.com/blogs/160632
Kafka 架构原理
https://blog.csdn.net/u013256816/article/details/71091774
为什么不建议在 HBase 中使用过多的列族
https://blog.csdn.net/bingdianone/article/details/86062506
logstash过滤器插件filter详解及实例
https://www.cnblogs.com/FengGeBlog/p/10305318.html