Master | Worker | Client | |
---|---|---|---|
Node01 | ✓ | ||
Node02 | ✓ | ||
Node03 | ✓ | ||
Node04 | ✓ |
1. 解压spark-2.3.1-bin-hadoop2.6.tgz
[root@node01 software]# tar -zvxf spark-2.3.1-bin-hadoop2.6.tgz -C /opt/ycyz/
2. 进入spark下的conf目录,配置slaves
[root@node01 conf]# cp slaves.template slaves
[root@node01 conf]# vi slaves
node03
node04
3. 修改spark-env.sh文件
[root@node01 conf]# cp spark-env.sh.template spark-env.sh
[root@node01 conf]# vi spark-env.sh
# master节点
export SPARK_MASTER_HOST=node01
# 任务提交端口
export SPARK_MASTER_PORT=7077
# worker节点核心数
export SPARK_WORKER_CORES=2
# worker节点可用内存
export SPARK_WORKER_MEMORY=2g
4. 将配置好的spark分发到其他节点
[root@node01 ycyz]# scp -r spark-2.3.1-bin-hadoop2.6/ node03:`pwd`
5. 执行spark安装目录下sbin目录中的start-all.sh可启动spark,web ui端口为8080
配置基于yarn运行任务
[root@node01 ~]# vi /opt/ycyz/spark-2.3.1-bin-hadoop2.6/conf/spark-env.sh
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
任务提交方式
1. standalone
- client
[root@node01 bin]# ./spark-submit \
--master spark://node01:7077 \
--class org.apache.spark.examples.SparkPi \
../examples/jars/spark-examples_2.11-2.3.1.jar 100
- cluster
[root@node01 bin]# ./spark-submit \
--master spark://node01:7077\
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
../examples/jars/spark-examples_2.11-2.3.1.jar 100
2. yarn
- client
[root@node01 bin]# ./spark-submit \
--master yarn-client \
--class org.apache.spark.examples.SparkPi \
../examples/jars/spark-examples_2.11-2.3.1.jar 100
- cluster
./spark-submit \
--master yarn-cluster \
--class org.apache.spark.examples.SparkPi \
../examples/jars/spark-examples_2.11-2.3.1.jar 100
如果发生虚拟内存不足导致报错可在yarn-site.xml中进行以下配置
增加以下几个配置项
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>9000</value>
<discription>每个任务最多可用内存,单位MB,默认8192MB</discription>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>200</value>
<discription>每个任务最少可用内存,单位MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4.1</value>
<discription>container最少使用的虚拟内存比例</discription>
</property>
或者关闭内存检查
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>