Hadoop入门（一）伪分布式

发自简书

[Pseudo-Distributed Operation]
*   [Configuration]
*   [Setup passphraseless ssh]
*   [Execution]
*   [YARN on a Single Node]

安装软件

Ubuntu 18.04.2

sudo apt-get install ssh 
sudo apt-get install rsync
tar -xzvf hadoop-3.1.2.tar.gz

sudo vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
export HADOOP_HOME=/home/njupt4145438/Downloads/hadoop-3.1.2

source /etc/profile

cd hadoop-3.1.2
mkdir logs
ls

不要用root

sudo vim hadoop-env.sh

set to the root of your Java installation

export JAVA_HOME=/usr/java/latest

改一下IP

sudo vim core-site.xml

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://192.168.179.128:9000</value>
   </property>
</configuration>

sudo vim hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

ssh不用密钥

$ ssh localhost
$ ssh 192.168.179.128
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

格式化

$ bin/hdfs namenode -format

守护进程

$ sbin/start-dfs.sh

确保有权限

chmod 777

Make the HDFS directories required to execute MapReduce jobs:

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

传一个test.txt，删掉本地的test.txt

$ bin/hdfs dfs -put test.txt
$ rm test.txt
$ bin/hdfs dfs -ls /user/njupt4145438
$ bin/hdfs dfs -get test.txt

//When you’re done, stop the daemons with
$ sbin/stop-dfs.sh

您可以在伪分布式模式下对yarn运行mapreduce作业，方法是设置一些参数，另外运行resourcemanager守护进程和nodemanager守护进程。

sudo vim mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
sudo vim etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
//守护进程Start ResourceManager daemon and NodeManager daemon
  $ sbin/start-yarn.sh
//When you’re done, stop the daemons with
  $ sbin/stop-yarn.sh

pom.xml

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-auth</artifactId>
            <version>3.1.2</version>
        </dependency>
    </dependencies>

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URI;

public class Main {
    public static void main(String[] args) throws IOException {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create("hdfs://192.168.179.128:9000/user/njupt4145438/test.txt"), conf);
        FSDataInputStream is = fs.open(new Path("hdfs://192.168.179.128:9000/user/njupt4145438/test.txt"));
        OutputStream os=new FileOutputStream(new File("D:/a.txt"));
        byte[] buff= new byte[1024];
        int length = 0;
        while ((length=is.read(buff))!=-1){
            System.out.println(new String(buff,0,length));
            os.write(buff,0,length);
            os.flush();
        }
    }
}

Hadoop入门（一）伪分布式

安装软件

不要用root

ssh不用密钥

确保有权限

推荐阅读更多精彩内容