Hadoop的核心就是HDFS和MapReduce
首先安装Hadoop
下载 Hadoop,解压到本地目录
或使用brew安装
$ brew install Hadoop
配置ssh免密码登录
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
将生成的公钥加入到用于认证的公钥文件中
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
接下来测试一下是否配置成功
$ ssh localhost
如果遇到connection refused之类的错误,检查一下mac是否开启远程登录功能,在系统偏好设置中可以设置。
设置环境变量
$ export HADOOP_HOME=/Users/hadoop/hadoop-1.2.1
$ export PATH=$PATH:$HADOOP_HOME/bin
配置Hadoop
../hadoop/conf/hadoop-env.sh
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
export HADOOP_HEAPSIZE=2000
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
../hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>hdfs://localhost:9000</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
../hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
../hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001/value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
</configuration>
运行
$ cd /usr/local/Cellar/hadoop/2.8.0/libexec
# 然后格式化文件系统
$ bin/hdfs namenode -format
# 启动NameNode和DataNode的守护进程
$ sbin/start-dfs.sh
# 启动ResourceManager和NodeManager的守护进程
$ sbin/start-yarn.sh
查看Hadoop集群的信息
http://localhost:8088
hadoop集群运行情况
http://localhost:50070