1. SNN****(****secondNamenode****)
1.1****secondNamenode****工作机制
1.secondNamenode执行checkpoint动作的时候,namenode会停止使用当前的edit文件515-516,会暂时将读写操作记录到一个新的edit文件中 517
2.secondNamenode将namenode的fsImage 514 和 edits文件 515-516 远程下载到本地
3.secondNamenode将fsimage 514加载到内存中,将 edits文件 515-516 内容之内存中从头到尾的执行一次,创建一个新的fsimage文件 516
4.secondNamenode将新的fsimage 516推送给namenode
5.namenode接受到fsimage 516.ckpt 滚动为fsimage 516,新的edit文件中 517.new 滚动为 edit 517 是一份最新edits文件
1.2 secondNamenode**** 学习的价值
SNN操作流程 一般主要是面试,但是一定要了解 帮助对hdfs的底层实现基本掌握。
生产上我们是不用secondNamenode ,是用HDFS HA (热备)
会有两个namenode
NN active NN standby 热备
2. hadoop命令
2.1 hadoop 命令来源
[hadoop@ruozedata001 bin]$ ./hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
s3guard manage data on S3
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
2.2 hadoop 常见的压缩格式
hadoop: zlib: snappy: lz4: bzip2: openssl:
2.3 查看****是否支持压缩
[hadoop@ruozedata001 bin]$ hadoop checknative
20/11/28 20:51:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
编译: https://blog.csdn.net/u010452388/article/details/99691421
涉及到maven
执行或者程序抛异常
[hadoop@ruozedata001 bin]$ hadoop classpath
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/etc/hadoop:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/yarn/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/yarn/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce/*:/home/hadoop/app/hadoop/contrib/capacity-scheduler/*.jar
[hadoop@ruozedata001 bin]$
http://cn.voidcc.com/question/p-tenieuea-bex.html
3. hdfs命令
3.1 hdfs命令的来源
[hadoop@ruozedata001 bin]$ ./hdfs
Usage: hdfs [--config confdir] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
diskbalancer Distributes data evenly among disks on a given node
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
Most commands print help when invoked w/o parameters.
[hadoop@ruozedata001 bin]$
3.2 温馨提示
hadoop fs 和 hdfs dfs 是等价的
脚本里面执行的内容是一样的
Hadoop fs
# the core commands
if [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
Hdfs dfs
elif [ "$COMMAND" = "dfs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
hdfs dfs命令:
Usage: hadoop fs [generic options]
[-cat [-ignoreCrc] <src> ...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] 等价于put
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] 等价于get
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-du [-s] [-h] [-x] <path> ...]
[-find <path> ... <expression> ...]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-mv <src> ... <dst>] 【生产上不建议使用移动,原因是移动过程中假如有问题,会导致数据不全。建议是使用cp ,验证通过,再去删除源端】
[-rm [-f] [-r|-R] [-skipTrash] <src> ...] 【-skipTrash 不建议使用】
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
3.3.dfs管理操作命令
[hadoop@ruozedata001 bin]$ hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report [-live] [-dead] [-decommissioning]]
[-safemode <enter | leave | get | wait>] 【安全模式】
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-setQuota <quota> <dirname>...<dirname>]
[-clrQuota <dirname>...<dirname>]
[-setSpaceQuota <quota> <dirname>...<dirname>]
[-clrSpaceQuota <dirname>...<dirname>]
[-finalizeUpgrade]
[-rollingUpgrade [<query|prepare|finalize>]]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshCallQueue]
[-refresh <host:ipc_port> <key> [arg1..argn]
[-reconfig <datanode|...> <host:ipc_port> <start|status|properties>]
[-printTopology]
[-refreshNamenodes datanode_host:ipc_port]
[-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
[-setBalancerBandwidth <bandwidth in bytes per second>]
[-fetchImage <local directory>]
[-allowSnapshot <snapshotDir>]
[-disallowSnapshot <snapshotDir>]
[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-metasave filename]
[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
[-listOpenFiles [-blockingDecommission] [-path <path>]]
[-help [cmd]]
3.4 ** shell脚本封装****,****获取HA切换状态预警脚本**
高级班 shell脚本封装 获取HA切换状态预警脚本
[hadoop@ruozedata001 bin]$ hdfs haadmin
Usage: DFSHAAdmin [-ns <nameserviceId>]
[-transitionToActive <serviceId> [--forceactive]]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
getconf get config values from configuration
健康检查
[hadoop@ruozedata001 bin]$ hdfs fsck /
20/11/28 21:14:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://ruozedata001:50070/fsck?ugi=hadoop&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.0.3 for path / at Sat Nov 28 21:14:58 CST 2020
.
/1.log: Under replicated BP-1245831-192.168.0.3-1605965291938:blk_1073741868_1044\. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
.
/2.log: Under replicated BP-1245831-192.168.0.3-1605965291938:blk_1073741869_1045\. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
.....................................Status: HEALTHY
Total size: 257432 B
Total dirs: 19
Total files: 39
Total symlinks: 0
Total blocks (validated): 37 (avg. block size 6957 B)
Minimally replicated blocks: 37 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 2 (5.4054055 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 2 (5.1282053 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Sat Nov 28 21:14:58 CST 2020 in 4 milliseconds
The filesystem under path '/' is HEALTHY
[hadoop@ruozedata001 bin]$
4. 安全模式
[hadoop@ruozedata001 bin]$ hdfs dfsadmin -safemode get 【先开启hdfs】
20/11/28 21:37:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF
[hadoop@ruozedata001 bin]$
OFF关闭 读写都ok
ON开启 写不行,读ok
[hadoop@ruozedata001 bin]$ hdfs dfs -put 3.log /
20/11/28 21:39:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: Cannot create file/3.log._COPYING_. Name node is in safe mode.
[hadoop@ruozedata001 bin]$
[hadoop@ruozedata001 bin]$
[hadoop@ruozedata001 bin]$ hdfs dfs -ls /
20/11/28 21:40:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
-rw-r--r-- 2 hadoop supergroup 4 2020-11-25 22:31 /1.log
-rw-r--r-- 2 hadoop supergroup 4 2020-11-25 22:33 /2.log
drwxr-xr-x - hadoop supergroup 0 2020-11-28 19:09 /system
drwx------ - hadoop supergroup 0 2020-11-22 19:52 /tmp
drwxr-xr-x - hadoop supergroup 0 2020-11-21 21:50 /user
drwxr-xr-x - hadoop supergroup 0 2020-11-22 19:52 /wordcount
[hadoop@ruozedata001 bin]$ hdfs dfs -cat /1.log
20/11/28 21:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
123
[hadoop@ruozedata001 bin]$
4.1 被动--》安全模式
未来必然hdfs查看日志出现安全模式的英文单词,不要大惊小怪,
必然说明你的hdfs集群是有问题的,相当于处于一个保护模式
一般需要你尝试手动执行命令,离开安全模式 【优先操作】
4.2 主动--》安全模式,做 维护操作
这个时间段保证hdfs不会有新数据进入
5. 回收站
5.1设置回收站时间
hdfs-site.xml文件中:
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
72460=10080
[hadoop@ruozedata001 ~]$ hdfs dfs -rm /1.log
20/11/28 21:59:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/28 21:59:48 INFO fs.TrashPolicyDefault: Moved: 'hdfs://ruozedata001:9000/1.log' to trash at: hdfs://ruozedata001:9000/user/hadoop/.Trash/Current/1.log
hdfs://ruozedata001:9000/user/hadoop/.Trash/Current/1.log这个是回收站的地址
[hadoop@ruozedata001 ~]$ hdfs dfs -rm -skipTrash /2.log
20/11/28 22:00:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleted /2.log
[hadoop@ruozedata001 ~]$
【生产上必须要回收站,且回收站默认时间尽量长,7天;】
【涉及到删除,不准使用 -skipTrash,就是让文件进入回收站,以防万一 】
6. 各个节点平衡
[hadoop@ruozedata001 sbin]$ sh ./start-balancer.sh
[hadoop@ruozedata001 sbin]$ cat /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-balancer-ruozedata001.log
2020-11-28 22:07:35,135 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: namenodes = [hdfs://ruozedata001:9000]
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false]
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: included nodes = []
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: excluded nodes = []
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: source nodes = []
2020-11-28 22:07:35,242 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-11-28 22:07:36,086 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2020-11-28 22:07:36,086 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
2020-11-28 22:07:36,087 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
2020-11-28 22:07:36,087 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2020-11-28 22:07:36,090 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2020-11-28 22:07:36,103 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.0.3:50010
2020-11-28 22:07:36,104 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 over-utilized: []
2020-11-28 22:07:36,104 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 underutilized: []
[hadoop@ruozedata001 sbin]$
threshold = 10.0
每个节点磁盘使用率-平均磁盘使用率<10%
第一个节点 90% -76% = 14% 多了4%
第二个节点 60% -76% = -16% 少-16%
第三个节点 80% -76%= 4% 满足
230%/3=76%
【生产上,写个定时脚本,每天晚上业务低谷去执行一下】
./start-balancer.sh
参数 dfs.datanode.balance.bandwidthPerSec 10m--》50m
控制数据平衡操作的带宽大小
假如生产就3台机器 3个副本,请问这个定时脚本有用吗?没有用
7.单个节点多块磁盘平衡
7.1 设置 hdfs-site-xml
<property>
<name>dfs.datanode.data.dir </name>
<value>/data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn</value>
</property>
/data01 100G
/data02 200G
/data03 490G
[hadoop@ruozedata001 sbin]$ hdfs diskbalancer
usage: hdfs diskbalancer [command] [options]
DiskBalancer distributes data evenly between different disks on a
datanode. DiskBalancer operates by generating a plan, that tells datanode
how to move data between disks. Users can execute a plan by submitting it
to the datanode.
To get specific help on a particular command please run
hdfs diskbalancer -help <command>.
--help <arg> valid commands are plan | execute | query | cancel |
report
[hadoop@ruozedata001 sbin]$
Apache hadoop2.x 没戏 不支持 dfs.disk.balancer.enabled 搜索不到
https://hadoop.apache.org/docs/r2.10.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Apache hadoop3.x 支持 dfs.disk.balancer.enabled 搜索到 是true
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
CDH hadoop2.x 支持 dfs.disk.balancer.enabled 搜索到 是false
如何去执行呢?
文档:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html
<property>
<name>dfs.disk.balancer.enabled</name>
<value>true</value>
</property>
[hadoop@ruozedata001 hadoop]$ hdfs diskbalancer -plan ruozedata001
20/11/28 22:37:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/28 22:37:02 INFO planner.GreedyPlanner: Starting plan for Node : ruozedata001:50020
20/11/28 22:37:02 INFO planner.GreedyPlanner: Compute Plan for Node : ruozedata001:50020 took 1 ms
20/11/28 22:37:03 INFO command.Command: No plan generated. DiskBalancing not needed for node: ruozedata001 threshold used: 10.0
hdfs diskbalancer -execute ruozedata001.plan.json 执行
hdfs diskbalancer -query ruozedata001
生产
【生产上,写个定时脚本,每日晚上业务低谷去执行一下】
8.总结:
1.先自己分析,必须找到log-->error
2.百度谷歌搜索
3.问老师,问同事,问群友
4.apache issue网站
5.源代码导入idea debug
6.如何找到log文件:
配置文件 my.cnf data/hostname.err文件
当前目录的logs文件夹
/var/log
ps -ef 查看进程描述
作业:
1.snn整理
2.hadoop hdfs命令梳理
3.如上四个整理
4.写到博客
5.编译hadoop 支持压缩