Hadoop入门(一)伪分布式

发自简书

[Pseudo-Distributed Operation]
*   [Configuration]
*   [Setup passphraseless ssh]
*   [Execution]
*   [YARN on a Single Node]

安装软件

Ubuntu 18.04.2

sudo apt-get install ssh 
sudo apt-get install rsync
tar -xzvf hadoop-3.1.2.tar.gz
sudo vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
export HADOOP_HOME=/home/njupt4145438/Downloads/hadoop-3.1.2

source /etc/profile
cd hadoop-3.1.2
mkdir logs
ls

不要用root

sudo vim hadoop-env.sh 

set to the root of your Java installation

export JAVA_HOME=/usr/java/latest

改一下IP

sudo vim core-site.xml
<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://192.168.179.128:9000</value>
   </property>
</configuration>
sudo vim hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

ssh不用密钥

$ ssh localhost
$ ssh 192.168.179.128
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

格式化

$ bin/hdfs namenode -format

守护进程

$ sbin/start-dfs.sh

确保有权限

chmod 777

Make the HDFS directories required to execute MapReduce jobs:

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

传一个test.txt,删掉本地的test.txt

$ bin/hdfs dfs -put test.txt
$ rm test.txt
$ bin/hdfs dfs -ls /user/njupt4145438
$ bin/hdfs dfs -get test.txt

//When you’re done, stop the daemons with
$ sbin/stop-dfs.sh

您可以在伪分布式模式下对yarn运行mapreduce作业,方法是设置一些参数,另外运行resourcemanager守护进程和nodemanager守护进程。

sudo vim mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
sudo vim etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
//守护进程Start ResourceManager daemon and NodeManager daemon
  $ sbin/start-yarn.sh
//When you’re done, stop the daemons with
  $ sbin/stop-yarn.sh

pom.xml

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-auth</artifactId>
            <version>3.1.2</version>
        </dependency>
    </dependencies>
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URI;

public class Main {
    public static void main(String[] args) throws IOException {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create("hdfs://192.168.179.128:9000/user/njupt4145438/test.txt"), conf);
        FSDataInputStream is = fs.open(new Path("hdfs://192.168.179.128:9000/user/njupt4145438/test.txt"));
        OutputStream os=new FileOutputStream(new File("D:/a.txt"));
        byte[] buff= new byte[1024];
        int length = 0;
        while ((length=is.read(buff))!=-1){
            System.out.println(new String(buff,0,length));
            os.write(buff,0,length);
            os.flush();
        }
    }
}

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 开源地址:https://github.com/bigbeef个人博客:http://blog.cppba.com...
    大黄蜂coder阅读 301评论 0 1
  • 一、系统参数配置优化 1、系统内核参数优化配置 修改文件/etc/sysctl.conf,添加如下配置,然后执行s...
    张伟科阅读 3,792评论 0 14
  • Hadoop HA集群搭建文档.............................................
    钟敏_1788阅读 1,493评论 0 0
  • 前言 Hadoop在大数据技术体系中的地位至关重要,Hadoop是大数据技术的基础,对Hadoop基础知识的掌握的...
    piziyang12138阅读 1,973评论 0 3
  • 世界杯如果没有女人,还叫世界杯吗?尽管女人只是出现在开幕式和观众席上,但也像极了划破世界杯上空的一颗颗流星,绚烂夺...
    who91740阅读 300评论 0 3