环境准备
- jdk1.8.0_301
- scala-2.11.8
- spark-2.4.8-bin-hadoop2.7
- hadoop-2.7.6(spark on yarn时需要)
- 当前目录:/root/***/packages/
- 当前机器:bigdata112
1. Local模式
安装jdk
wget https://download.oracle.com/otn/java/jdk/8u301-b09/d3c52aa6bfa54d3ca74e617f18309292/jdk-8u301-linux-x64.tar.gz?AuthParam=1631169458_b753f63069d375ab0a6a52e1d9cd9013
tar xzvf jdk-8u301-linux-x64.tar.gz -C ../software/
- 配置环境变量:
vim ~/.profile
,输入:
JAVA_HOME=/root/***/software/jdk1.8.0_301
PATH=$PATH:$JAVA_HOME/bin
- 环境变量生效
source ~/.profile
- 验证安装:
java -version
,出现以下信息说明安装成功:
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)
安装scala
wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
tar xzvf scala-2.11.8.tgz -C ../software/
- 配置环境变量:
vim ~/.profile
,输入:
SCALA_HOME=/root/***/software/scala-2.11.8
PATH=$PATH:$SCALA_HOME/bin
- 环境变量生效
source ~/.profile
- 验证安装:
scala
,出现以下信息说明安装成功,:q
退出:
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_301).
Type in expressions for evaluation. Or try :help.
安装spark
wget https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
tar xzvf spark-2.4.8-bin-hadoop2.7.tgz -C ../software/
- 配置环境变量:
vim ~/.profile
,输入:
SPARK_HOME=/root/***/software/spark-2.4.8-bin-hadoop2.7
PATH=$PATH:$SPARK_HOME/bin
- 环境变量生效
source ~/.profile
- 验证安装:
spark-shell
,出现以下信息说明安装成功,:q
退出:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.8
/_/
2. Standalone模式
hostname |
role |
bigdata112 |
master |
bigdata113 |
worker |
bigdata114 |
worker |
bigdata115 |
worker |
- 在Local模式基础上(当前机器:bigdata112):
cd ../software/spark-2.4.8-bin-hadoop2.7/conf/
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export JAVA_HOME=/root/***/software/jdk1.8.0_301
export SCALA_HOME=/root/***/software/scala-2.11.8
export SPARK_HOME=/root/***/software/spark-2.4.8-bin-hadoop2.7
export SPARK_EXECUTOR_MEMORY=5G
export SPARK_EXECUTOR_cores=2
export SPARK_WORKER_CORES=2
cp slaves.template slaves
vim slaves
bigdata113
bigdata114
bigdata115
- 将spark目录复制到其他机器上(注意环境变量也要保持一致)
scp /root/***/software/spark-2.4.8-bin-hadoop2.7 bigdata113:/root/***/software/
scp /root/***/software/spark-2.4.8-bin-hadoop2.7 bigdata114:/root/***/software/
scp /root/***/software/spark-2.4.8-bin-hadoop2.7 bigdata115:/root/***/software/
- 启动master(webUI端口默认是8080):
./root/***/software/spark-2.4.8-bin-hadoop2.7/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /root/***/software/spark-2.4.8-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-***2021.out
- 启动salves(webUI端口默认是8080):
./root/***/software/spark-2.4.8-bin-hadoop2.7/sbin/start-slaves.sh
bigdata113: starting org.apache.spark.deploy.worker.Worker, logging to /root/***/software/spark-2.4.8-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-***2021.out
bigdata114: starting org.apache.spark.deploy.worker.Worker, logging to /root/***/software/spark-2.4.8-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-2-***2021.out
bigdata115: starting org.apache.spark.deploy.worker.Worker, logging to /root/***/software/spark-2.4.8-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-3-***2021.out
- 执行
spark-shell --master spark://***2021:7077
就能看到master = spark://***2021:7077的信息
Spark context available as 'sc' (master = spark://***2021:7077, app id = app-20210909163213-0001).
Spark On Yarn模式
- hadoop/yarn安装见我的另一个博客:Hadoop三种模式的安装与配置
- 基于Standalone模式基础,在spark-env.sh中添加hadoop和yarn的配置文件位置信息
HADOOP_CONF_DIR=/root/***/software/hadoop-2.7.6/etc/hadoop
YARN_CONF_DIR=/root/***/software/hadoop-2.7.6/etc/hadoop
- 将spark-env.sh文件复制到其他机器
- 启动hadoop、yarn(无需启动spark的Master和slaves,因为它两将由yarn管理启动)
- 执行
spark-shell --master yarn --deploy-mode client
就能看到master = yarn的信息
Spark context available as 'sc' (master = yarn, app id = application_1560334779290_0001).