Spark on kubernetes

一、Spark

  • Service

    apiVersion: v1
    kind: Service
    metadata:
      name: spark-master
      namespace: yarn
      labels:
        app: spark-master
    spec:
      ports:
      - name: webui
        port: 8080
        protocol: TCP
        targetPort: 8080
      - name: master
        port: 7077
        protocol: TCP
        targetPort: 7077
      - name: rm
        port: 8032
        protocol: TCP
        targetPort: 8032
      - name: tracker
        port: 8031
        protocol: TCP
        targetPort: 8031
      selector:
        app: spark-master
    
  • Configmap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: spark-cm
      namespace: yarn
    data:
      core-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hdfs-namenode.yarn.svc.cluster.local:9000</value>
            <description>namenode address</description>
          </property>
          <property>
            <name>io.file.buffer.size</name>
            <value>131072</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop/tmp</value>
          </property>
        </configuration>
      mapred-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
            <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
            </property>
            <property>
                      <name>mapreduce.jobhistory.address</name>
                  <value>0.0.0.0:10020</value>
              </property>
            <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>0.0.0.0:19888</value>
            </property>
        </configuration>
      yarn-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>yarn-rm.yarn.svc.cluster.local</value>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
              <value>logs</value>
          </property>
          <property>
              <name>yarn.nodemanager.local-dirs</name>
              <value>/data/hadoop/yarn/local-dirs</value>
          </property>
          <property>
              <name>yarn.nodemanager.log-dirs</name>
              <value>/data/hadoop/yarn/log-dirs</value>
          </property>
          <property>
              <name>yarn.log.server.url</name>
              <value>http://0.0.0.0:19888/jobhistory/logs</value>
          </property>
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>true</value>
              <description>是否启用日志聚集功能</description>
          </property>
          <property>
              <name>yarn.log-aggregation.retain-seconds</name>
              <value>10080</value>
              <description>日志存储时间</description>
          </property>
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>/data/hadoop/yarn/log-dirs</value>
              <description>是否启用日志聚集功能</description>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir</name>
              <value>/yarn/app/logs</value>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
              <value>logs</value>
          </property>
        </configuration>
      spark-defaults.conf: |-
        spark.eventLog.enabled           true
        spark.eventLog.dir               hdfs://hdfs-namenode.yarn.svc.cluster.local:9000/sspark/event
        spark.yarn.historyServer.address http://0.0.0.0:18080
      spark-env.sh: |-
        export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://spark/logs -Dspark.history.retainedApplications=30"
        export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop"
        export SPARK_CONF_DIR="/usr/local/spark/conf"
        export SPARK_LOG_DIR="/data/spark/logs"
        export YARN_CONF_DIR=="/usr/local/hadoop/etc/hadoop"
        export SPARK_MASTER_HOST=0.0.0.0
        export SPARK_MASTER_PORT=7077
        export SPARK_MASTER_WEBUI_PORT=8080
        export SPARK_WORKER_PORT=7078
        export SPARK_WORKER_WEBUI_PORT=8081
    

    hdfs dfs -mkdir -p /spark/event

  • Master

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: spark-master
      namespace: yarn
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: spark-master
      template:
        metadata:
          labels:
            app: spark-master
        spec:
         containers:
            - name: historyserver
              image: spark:3.3.2
              command: ["spark-class"]
              args:
                - "org.apache.spark.deploy.history.HistoryServer"
              ports:
              - containerPort: 18080
                name: historyserver
          containers:
            - name: master
              image: spark:3.3.2
              command: ["spark-class"]
              args:
                - "org.apache.spark.deploy.master.Master"
                - "--properties-file"
                - "/usr/local/spark/conf/spark-defaults.conf"
              ports:
              - containerPort: 8032
                name: nm
              - containerPort: 8088
                name: webui
              volumeMounts:
              - name: spark-cm
                mountPath: /usr/local/hadoop/etc/hadoop/yarn-site.xml
                subPath: yarn-site.xml
              - name: spark-cm
                mountPath: /usr/local/hadoop/etc/hadoop/core-site.xml
                subPath: core-site.xml
              - name: spark-cm
                mountPath: /usr/local/hadoop/etc/hadoop/mapred-site.xml
                subPath: mapred-site.xml
              - name: spark-cm
                mountPath: /usr/local/spark/conf/spark-env.sh
                subPath: spark-env.sh
              - name: spark-cm
                mountPath: /usr/local/spark/conf/spark-defaults.conf
                subPath: spark-defaults.conf
              - name: spark-logs
                mountPath: /data/spark/logs
          volumes:
          - name: spark-cm
            configMap:
              name: spark-cm
              items:
              - key: yarn-site.xml
                path: yarn-site.xml
              - key: core-site.xml
                path: core-site.xml
              - key: mapred-site.xml
                path: mapred-site.xml
              - key: spark-env.sh
                path: spark-env.sh
              - key: spark-defaults.conf
                path: spark-defaults.conf
          - name: spark-logs
            hostPath:
              path: /data/spark/logs
              type: Directory
          nodeSelector:
            spark-master: "true"
          restartPolicy: Always
    

    TODO:

    需要换基础镜像

二、Dockerfile

  • jdk

    FROM alpine:3.4
    
    # A few problems with compiling Java from source:
    #  1. Oracle.  Licensing prevents us from redistributing the official JDK.
    #  2. Compiling OpenJDK also requires the JDK to be installed, and it gets
    #       really hairy.
    
    # Default to UTF-8 file.encoding
    ENV LANG C.UTF-8
    
    # add a simple script that can auto-detect the appropriate JAVA_HOME value
    # based on whether the JDK or only the JRE is installed
    RUN { \
                    echo '#!/bin/sh'; \
                    echo 'set -e'; \
                    echo; \
                    echo 'dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"'; \
            } > /usr/local/bin/docker-java-home \
            && chmod +x /usr/local/bin/docker-java-home
    ENV JAVA_HOME /usr/lib/jvm/java-1.8-openjdk
    ENV PATH $PATH:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin
    
    ENV JAVA_VERSION 8u111
    ENV JAVA_ALPINE_VERSION 8.111.14-r0
    
    RUN set -x \
            && apk add --no-cache bash \
                    openjdk8="$JAVA_ALPINE_VERSION" \
            && [ "$JAVA_HOME" = "$(docker-java-home)" ]
    

    docker build -t jdk:1.8 .

  • hadoop

    FROM jdk:1.8
    
    WORKDIR /usr/local/hadoop
    ADD hadoop /usr/local/hadoop
    
    
    ENV HADOOP_HOME /usr/local/hadoop
    ENV PATH $PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
    

    docker build -t hadoop:3.2.4 .

  • spark

三、containerd配置私有仓库

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "k8s.gcr.io/pause:3.8"
    max_container_log_line_size = -1
    enable_unprivileged_ports = false
    enable_unprivileged_icmp = false
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      snapshotter = "overlayfs"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          base_runtime_spec = "/etc/containerd/cri-base.json"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            systemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".registry]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://registry-1.docker.io"]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.cn-hangzhou.aliyuncs.com"]
          endpoint = ["https://registry.cn-hangzhou.aliyuncs.com"]
      [plugins."io.containerd.grpc.v1.cri".registry.configs]
        [plugins."io.containerd.grpc.v1.cri".registry.configs."registry.cn-hangzhou.aliyuncs.com".tls]
          insecure_skip_verify = true
        [plugins."io.containerd.grpc.v1.cri".registry.configs."registry.cn-hangzhou.aliyuncs.com".auth]
          username = "账号"
          password = "密码"
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 218,122评论 6 505
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,070评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,491评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,636评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,676评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,541评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,292评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,211评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,655评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,846评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,965评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,684评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,295评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,894评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,012评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,126评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,914评论 2 355

推荐阅读更多精彩内容