HDFS的设计目标
通过上一篇文章的介绍我们已经了解到HDFS到底是怎样的东西,以及它是怎样通过多副本机制来提供高可靠性的,我们可以发现HDFS设计目标可以总结为以下几点:
- 非常巨大的分布式文件系统
- 运行在普通廉价的硬件上
- 易扩展、为用户提供性能不错的文件存储服务
HDFS的架构
我们通过官网的文档来了解HDFS的基础架构(http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html):
Introduction
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project. The project URL is http://hadoop.apache.org/.
这段是HDFS的基本介绍,Hadoop分布式文件系统是一个设计可以运行在廉价硬件的分布式系统。它跟目前存在的分布式系统有很多相似之处。然而,不同之处才是重要的。HDFS是一个高容错和可部署(deployed)在廉价机器上的系统。HDFS提供高吞吐(hign throughout)数据能力适合处理大量数据。HDFS松散了一些需求使得支持流式传输。HDFS原本是为Apache Butch的搜索引擎设计的,现在是Apache Hadoop Core项目的子项目。
Assumptions and Goals
Hardware Failure
Hardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.
硬件失效,硬件失效是常态而不是意外。HDFS实例可能包含上百成千个服务器,每个节点存储着文件系统的部分数据。事实是集群有大量的节点,而每个节点都存在一定的概率失效也就意味着HDFS的一些组成部分经常失效。因此,检测错误、快速和自动恢复是HDFS的核心架构。
Streaming Data Access
Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements that are not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates.
流式数据访问,应用运行在HDFS需要允许流式访问它的数据集。这不是普通的应用程序运行在普通的文件系统上。HDFS是被设计用于批量处理而非用户交互。设计的重点是高吞吐量访问而不是低延迟数据访问(low latency of data access)。POSIX语义在一些关键领域是用来提高吞吐量。
Large Data Sets
Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance.
大数据集,运行在HDFS的应用程序有大数据集。一个典型文档在HDFS是GB到TB级别的。因此,HDFS是用来支持大文件。它应该提供高带宽和可扩展(scale)到上百节点在一个集群中。它应该支持在一个实例中有以千万计的文件数。
Simple Coherency Model
HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed except for appends and truncates. Appending the content to the end of the files is supported but cannot be updated at arbitrary point. This assumption simplifies data coherency issues and enables high throughput data access. A MapReduce application or a web crawler application fits perfectly with this model.
简单一致模型,HDFS应用需要一个一次写入多次读取的文件访问模型。一个文件一旦创建,写入和关闭都不需要改变除了追加和截断(truncate)。支持在文件的末端进行追加数据而不支持在文件的任意位置进行修改。这个假设简化了数据一致性问题和支持高吞吐量的访问。一个Map/Reduce任务或者web爬虫(crawler)完美匹配了这个模型。
“Moving Computation is Cheaper than Moving Data”
A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located.
移动计算比移动数据更划算,如果应用的计算在它要操作的数据附近执行那就会更高效。尤其是数据集非常大的时候。这将最大限度地减少网络拥堵(congestion)和提高系统的吞吐量。这个假设是,在应用运行中,移动计算到要操作的数据附近往往比移动数据数据更好。HDFS提供接口让应用去移动计算到数据所在的位置。
Portability Across Heterogeneous Hardware and Software Platforms
HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.
轻便的跨异构的软硬件平台,被设计成可轻便从一个平台跨到另一个平台。这促使HDFS被广泛地采用作为应用的大数据集系统。
NameNode and DataNodes
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
HDFS使用主/从架构。一个HDFS集群包含一个NameNode,一个服务器管理系统的命名空间和并控制客户端对文件的访问。此外,有许多的DataNodes,通常是集群中的每个节点,用来管理它们所运行的节点的存储。HDFS暴露了文件系统的命名空间并且允许将用户数据存储在文件中。在系统内部,一个文件被切割成一个或者多个块(blocks),而这些块将储存在一系列的DataNode中。NameNode执行文件系统的命名空间操作例如打开、关闭和重命名文件和路径。它也能指定数据块对应的DataNode。DataNode负责提供客户端对文件的读写服务。DataNode也负责执行NameNode的创建、删除和复制指令。
The NameNode and DataNode are pieces of software designed to run on commodity machines. These machines typically run a GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case.
NameNode和DatNode是设计运行在商业电脑(commodity)的软件框架。这些机器通常是运行着GNU/Linux操作系统。HDFS是用Java语言构建的;任何机器只要支持Java就可以运行NameNode或者DataNode。使用Java这种高可移植性的语言就意味着HDFS可以部署(deployed)在大范围的机器上。一个典型的部署通常是在专用(dedicated)的机器上,这个机器只能运行NameNode软件。集群中的其他每个机器运行着单个DataNode实例。架构并不排除(preclude)在同一台机器部署多个DataNode,但是这种情况比较少见。
The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode.
集群中只存在一个NameNode实例极大地简化系统的架构。NameNode是HDFS元数据的仲裁者(arbitrator)和储存库。这个系统用这样的方式保证了数据的流动不能避过NameNode。
以上是关于HDFS中的架构以及Namenode和Datanode的介绍,我们稍微总结一下:
- HDFS的架构,HDFS是master/slaves的主从架构,就是一个master(Namenode)带多个slaves(Datanode)。
- 在系统内部,一个文件会被拆分成多个块(block),假设一个块的大小是128M,那么一个130M的文件会被拆分成两个块,一个128M,一个2M。
- NameNode
- NameNode负责客户端的请求响应。
- 负责元数据(Metadata)的管理。什么是元数据呢?我们从图中可以看到一个客户端想要访问HDFS中的文件,首先要向Namenode发出一个Metadata ops的请求,而Namenode中存有想要访问的文件的元数据,这些元数据包括文件的名称、副本系数、每个块存放在哪几个Datenode上,然后Namenode根据这些元数据去相应的Datanode上取数据(block ops)。
- Datanode
- 存储用户的文件对应的数据块(block)。
- 并且要定期向NameNode发送心跳信息,汇报本身及其所有的block信息,健康状况。NameNode作为管理者,自然需要知道它管理的每个节点到底存储了哪些数据,并且每个具体是个什么状况,如果一个Datanode跟Namenode报告说他出问题了,不能存储数据了,那么Namenode在接受到这个信息后自然就不会再把新的存储任务分配给这个节点了。所以DataNode需要定期向Namenode报告这些情况以便Namenode进行管理。
- 部署
- A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. 一个典型的部署结构是:1个Namenode+N个Datanodes。
- The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case.不排除Namenode和DataNode都在一个节点之上,但不建议。
The File System Namespace
HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS supports user quotas and access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.
HDFS支持传统的层级文件结构(hierarchical file organization)。用户或应用可以在这些目录下创建文件目录和存储文件。文件系统的命名空间层级跟其他已经在存在的文件系统很相像;可以创建和删除文件,将文件从一个目录移动到另一个目录或者重命名。HDFS支持用户限制(user quotas)和访问权限。HDFS不支持硬关联或者软关联。然而,HDFS架构不排除实现这些特性。
The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.
NameNode维持文件系统的命名空间。文件系统的命名空间或者它的属性的任何改变都被NameNode所记录。应用可以指定HDFS维持多少个文件副本。文件的拷贝数目称为文件的复制因子(replication factor)。这个信息将会被NameNode记录。
Data Replication
HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file.
All blocks in a file except the last block are the same size, while users can start a new block without filling out the last block to the configured block size after the support for variable length block was added to append and hsync.
An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.
The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.
HDFS是被设计成在一个集群中跨机器可靠地存储大量文件。它将每个文件存储为一序列(sequence)的块。文件的块被复制以保证可以容错。每个文件块的大小和复制因子都是可配置的。
一个文件的所有的块除了最后一个都是同样大小的。假设一个块是128M,文件的大小不可能刚好是128M的倍数,所以在切分的时候,最后一个块的大小肯定是小于等于128M的,而前面的块大小都是128M。
应用可以指定文件的副本数目。复制因子可以在文件创建时指定,在后面时间修改。HDFS中的文件是仅仅只能写入一次的(除非是添加和截断),并且是有严格要求的,在任意的时间只能有一个写者来进行数据的写入,是不支持多并发写入的。
NameNode控制着关于blocks复制的所有决定。它周期性地接收集群中DataNode发送的心跳和块报告。收到心跳意味着DataNode在正常地运行着。一个块报告包含着DataNode上所有块信息的集合。
HDFS的安装
HDFS的安装我就不多赘述了,网上有很多配置教程,但我想说一下我配置过程中遇到的几个问题:
配置hadoop配置文件时,由于是在网上复制的,一定要检查一下xml中的标签是否匹配,我在复制过程中发现有个
</value>
被写成了/value>
,结果导致配置出错。有些比较老的教程在运行hadoop的时候会使用
./sbin/start-all.sh
命令,如果这个启动不了,可以使用./sbin/start-dfs.sh
启动Namenode、DataNode、Secondarynode,再使用start-yarn.sh
。-
在执行了这两个命令后,hadoop就已经启动起来了,通过jps可以查看运行的的进程如下,
11296 NodeManager 11344 Jps 11061 SecondaryNameNode 10838 NameNode 10938 DataNode 11194 ResourceManager
如果通过jps命令却查不到Namenode、DataNode等进程,这可能是由配置文件配置不当所引起的,也可能是端口号被占用的问题,这都可以通过查看日志解决。日志位于hadoop安装目录下的logs文件夹,里面保存了Namenode、DataNode等启动过程中报错的方式,如果是端口号占用问题,需要在配置文件中更改端口号,不清楚的话可以去搜索Namenode、DataNode等分别对应的端口号怎么配置。
这里推荐一个配置教程:http://www.yiibai.com/hadoop/hadoop_enviornment_setup.html
HDFS shell常见命令
HDFS shell提供了一系列命令实现对HDFS上和本地数据的访问和操作。
其实熟悉linux操作的话还是没什么难度的,用法跟linux上几乎一样。
使用这些命令的标准格式是:hadoop fs -[command]
,比如ls
,hadoop fs -ls /
就是列出根目录下的所有文件。
put
使用方法:hadoop fs -put <文件源目录> <目标目录>
从本地文件系统中复制单个或多个源路径到目标文件系统。就是把本地的文件上传到HDFS上,一个参数是源文件路径,另一个是HDFS中目标文件路径。
示例:
hadoop fs -put hello.txt /
将hello.txt文件放到根目录中。
wangsheng@MacPro[10:18:56]:~/desktop (つ•̀ω•́)つcat hello.txt
hello java
hello text
hello hadoop
hello wangsheng
wangsheng@MacPro[10:19:05]:~/desktop (つ•̀ω•́)つhadoop fs -put hello.txt /
17/10/15 10:19:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
wangsheng@MacPro[10:19:23]:~/desktop (つ•̀ω•́)つhadoop fs -ls /
17/10/15 10:19:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 wangsheng supergroup 51 2017-10-15 10:19 /hello.txt
ls
使用方法:hadoop fs -ls <目录/文件>
如果是文件,则返回文件信息
如果是目录,则返回它直接子文件的一个列表,可以用-R选项递归显示。
示例:
hadoop fs -ls /
显示根目录下的直接子文件。
hadoop fs -ls -R /
跟linux中一样,递归显示根目录下的子文件。
wangsheng@MacPro[10:58:11]:~/desktop (つ•̀ω•́)つhadoop fs -ls /
17/10/15 10:58:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 wangsheng supergroup 51 2017-10-15 10:19 /hello.txt
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 10:58 /test
wangsheng@MacPro[10:58:17]:~/desktop (つ•̀ω•́)つhadoop fs -ls -R /
17/10/15 10:58:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 1 wangsheng supergroup 51 2017-10-15 10:19 /hello.txt
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 10:58 /test
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 10:58 /test/a
mkdir
使用方法:hadoop fs -mkdir <文件夹路径>
与linux中一样是创建文件夹,可以用-p选项递归创建。
示例:
hadoop fs -mkdir /test
在根目录下创建一个test文件夹。
hadoop fs -mkdir -p /data/a
在根目录下递归创建/data/a路径。
wangsheng@MacPro[11:36:42]:~/desktop (つ•̀ω•́)つhadoop fs -mkdir -p /data/a
17/10/15 11:37:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
wangsheng@MacPro[11:37:05]:~/desktop (つ•̀ω•́)つhadoop fs -ls -R /
17/10/15 11:37:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 11:37 /data
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 11:37 /data/a
-rw-r--r-- 1 wangsheng supergroup 51 2017-10-15 10:19 /hello.txt
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 10:58 /test
drwxr-xr-x - wangsheng supergroup 0 2017-10-15 10:58 /test/a
rm
使用方法:hadoop fs -rm <文件目录/文件>
删除指定的文件。只删除非空目录和文件。若要递归删除需要加-R选项。
示例:
hadoop fs -rm /hello.txt
删除hello.txt文件。
hadoop fs -rm -R /data/a
递归删除/data/a目录。
wangsheng@MacPro[11:41:06]:~/desktop (つ•̀ω•́)つhadoop fs -rm /hello.txt
17/10/15 11:41:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleted /hello.txt
wangsheng@MacPro[11:41:27]:~/desktop (つ•̀ω•́)つhadoop fs -rm -R /data/a
17/10/15 11:41:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleted /data/a
get
使用方法:hadoop fs -get <源文件路径>
将HDFS中的文件下载到本地。
示例:
hadoop fs -get /hello.txt
将hello.txt文件复制到本地。
wangsheng@MacPro[11:43:46]:~/desktop (つ•̀ω•́)つls
161208082042352.png
app
wangsheng@MacPro[11:43:48]:~/desktop (つ•̀ω•́)つhadoop fs -get /hello.txt
17/10/15 11:43:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
wangsheng@MacPro[11:43:56]:~/desktop (つ•̀ω•́)つls
161208082042352.png
hello.txt
app
For more commands , please visit:http://hadoop.apache.org/docs/r1.0.4/cn/hdfs_shell.html
参考资料
Hadoop官方文档翻译——HDFS Architecture:http://www.linuxidc.com/Linux/2016-12/138027.htm