Hadoop on kubernetes

一、HDFS

  • Service

    apiVersion: v1
    kind: Service
    metadata:
      name: hdfs-namenode
      namespace: yarn
      labels:
        app: hdfs-namenode
    spec:
      ports:
      - name: fs
        port: 9000
        protocol: TCP
        targetPort: 9000
      - name: webui
        port: 50070
        protocol: TCP
        targetPort: 50070
      selector:
        app: hdfs-namenode
      sessionAffinity: None
      type: ClusterIP
      selector:
        app: hdfs-namenode
    
  • Configmap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: hdfs-cm
      namespace: yarn
    data:
      core-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
            <name>fs.defaultFS</name>
            <value>hdfs://0.0.0.0:9000</value>
            <description>namenode address</description>
          </property>
          <property>
            <name>io.file.buffer.size</name>
            <value>131072</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop/tmp</value>
          </property>
        </configuration>
      dn-core-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hdfs-namenode.yarn.svc.cluster.local:9000</value>
          </property>
          <property>
            <name>io.file.buffer.size</name>
            <value>131072</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop/tmp</value>
          </property>
        </configuration>
      hdfs-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
             <name>dfs.namenode.name.dir</name>
             <value>/data/hadoop/dfs/name</value>
          </property>
          <property>
             <name>dfs.namenode.http-address</name>
             <value>0.0.0.0:50070</value>
          </property>
          <property>
             <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
             <value>false</value>
          </property>
        </configuration>
      dn-hdfs-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
             <name>dfs.datanode.data.dir</name>
             <value>/data/hadoop/dfs/data</value>
          </property>
          <property>
            <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
            <value>false</value>
          </property>
          <property>
            <name>dfs.client.use.datanode.hostname</name>
            <value>false</value>
          </property>
          <property>
            <name>dfs.datanode.use.datanode.hostname</name>
            <value>false</value>
          </property>
        </configuration>
    

    域名的后缀取决于集群的配置

    hdfs://hdfs-namenode.yarn.svc.cluster.local:9000

    hdfs://hdfs-namenode.yarn.svc.hadoop:9000

  • StatefulSet(namenode)

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: hdfs-namenode
      namespace: yarn
    spec:
      serviceName: hdfs-namenode
      replicas: 1
      selector:
        matchLabels:
          app: hdfs-namenode
      template:
        metadata:
          labels:
            app: hdfs-namenode
        spec:
          dnsPolicy: ClusterFirstWithHostNet
          initContainers:
          - name: format
            image: registry.cn-hangzhou.aliyuncs.com/davisgao/hadoop:3.2.4
            command: ["hdfs"]
            args:
            - "namenode"
            - "-format"
            - "-force"
            volumeMounts:
            - name: namenode-dir
              mountPath: /data/hadoop/dfs/name
          containers:
            - name: hdfs-namenode
              image: registry.cn-hangzhou.aliyuncs.com/davisgao/hadoop:3.2.4
              command: ["hdfs"]
              args:
                - "--config"
                - "/usr/local/hadoop/etc/hadoop"
                - "namenode"
              env:
              - name: HADOOP_LOG_DIR
                value: "/data/hadoop/name/logs"
              - name: HADOOP_CONF_DIR
                value: "/usr/local/hadoop/etc/hadoop"
              resources:
                limits:
                  cpu: "2"
                  memory: 6000Mi
                requests:
                  cpu: "1"
                  memory: 4Gi
              ports:
              - containerPort: 8020
                name: fs
              volumeMounts:
              - name: hdfs-cm
                  mountPath: /usr/local/hadoop/etc/hadoop/core-site.xml
                  subPath: core-site.xml
              - name: hdfs-cm
                  mountPath: /usr/local/hadoop/etc/hadoop/hdfs-site.xml
                  subPath: hdfs-site.xml
              - name: namenode-dir
                mountPath: /data/hadoop/dfs/name
              - name: hadoop-log-dir
                  mountPath: /data/hadoop/name/logs
          nodeSelector:
            hdfs-namenode: "true"
          restartPolicy: Always
          volumes:
            - name: hdfs-cm
              configMap:
                name: hdfs-cm
                items:
                - key: hdfs-site.xml
                  path: hdfs-site.xml
                - key: core-site.xml
                  path: core-site.xml
            - name: namenode-dir
              hostPath:
                path: /data/hadoop/dfs/name
                type: Directory
            - name: hadoop-log-dir
              hostPath:
                path: /data/hadoop/name/logs
                type: Directory
    

    namenode建议是用单独盘或者pv存储

    mkdir -p /data/hadoop/dfs/name
    mkdir -p /data/hadoop/name/logs

    TODO:

    在/data/hadoop/dfs/name下的current/VERSION中的blockpoolID=BP-2063912425-192.168.0.67-1680162780839依然和IP绑定

  • StatefulSet(datanode)

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: hdfs-datanode
      namespace: yarn
    spec:
      serviceName: hdfs-datanode
      replicas: 1
      selector:
        matchLabels:
          app: hdfs-datanode
      template:
        metadata:
          labels:
            app: hdfs-datanode
        spec:
          dnsPolicy: ClusterFirstWithHostNet
          containers:
            - name: hdfs-datanode
              image: registry.cn-hangzhou.aliyuncs.com/davisgao/hadoop:3.2.4
              command: ["hdfs"]
              args:
                - "--config"
                - "/usr/local/hadoop/etc/hadoop"
                - "datanode"
              env:
              - name: HADOOP_LOG_DIR
                value: "/data/hadoop/data/logs"
              - name: HADOOP_CONF_DIR
                value: "/usr/local/hadoop/etc/hadoop"
              resources:
                limits:
                  cpu: "2"
                  memory: 4Gi
                requests:
                  cpu: "1"
                  memory: 2Gi
              ports:
              - containerPort: 8020
                name: fs
              volumeMounts:
                  - name: hdfs-cm
                  mountPath: /usr/local/hadoop/etc/hadoop/core-site.xml
                  subPath: core-site.xml
                - name: hdfs-cm
                  mountPath: /usr/local/hadoop/etc/hadoop/hdfs-site.xml
                  subPath: hdfs-site.xml
                - name: datanode-dir
                  mountPath: /data/hadoop/dfs/data
                - name: hadoop-log-dir
                    mountPath: /data/hadoop/data/logs
          nodeSelector:
            hdfs-datanode: "true"
          restartPolicy: Always
          volumes:
            - name: hdfs-cm
              configMap:
                name: hdfs-cm
                items:
                - key: dn-hdfs-site.xml
                  path: hdfs-site.xml
                - key: dn-core-site.xml
                  path: core-site.xml
            - name: datanode-dir
              hostPath:
                path: /data/hadoop/dfs/data
                type: Directory
            - name: hadoop-log-dir
              hostPath:
                path: /data/hadoop/data/logs
                type: Directory
    

    datenode建议是用单独盘

    mkdir -p /data/hadoop/dfs/data
    mkdir -p /data/hadoop/data/logs

二、使用Yarn做资源管理

  • service

    apiVersion: v1
    kind: Service
    metadata:
      name: yarn-rm
      namespace: yarn
      labels:
        app: yarn-rm
    spec:
      ports:
      - name: webui
        port: 8088
        protocol: TCP
        targetPort: 8088
      - name: rm
        port: 8032
        protocol: TCP
        targetPort: 8032
      - name: tracker
        port: 8031
        protocol: TCP
        targetPort: 8031
      selector:
        app: yarn-rm
    
  • Configmap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: yarn-cm
      namespace: yarn
    data:
      core-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hdfs-namenode.yarn.svc.cluster.local:9000</value>
            <description>namenode address</description>
          </property>
          <property>
            <name>io.file.buffer.size</name>
            <value>131072</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop/tmp</value>
          </property>
        </configuration>
      mapred-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
            <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
            </property>
        </configuration>
      yarn-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>0.0.0.0</value>
          </property>
          <property>
              <name>yarn.nodemanager.local-dirs</name>
              <value>/data/hadoop/yarn/local-dirs</value>
          </property>
          <property>
              <name>yarn.nodemanager.log-dirs</name>
              <value>/data/hadoop/yarn/log-dirs</value>
          </property>
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>/data/hadoop/yarn/log-dirs</value>
              <description>是否启用日志聚集功能</description>
          </property>
          <property>
              <name>yarn.log-aggregation.retain-seconds</name>
              <value>10080</value>
              <description>日志存储时间</description>
          </property>
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>/data/hadoop/yarn/log-dirs</value>
              <description>是否启用日志聚集功能</description>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir</name>
              <value>/yarn/app/logs</value>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
              <value>logs</value>
          </property>
        </configuration>
      nm-yarn-site.xml: |
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>yarn-rm.yarn.svc.cluster.local</value>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
              <value>logs</value>
          </property>
          <property>
              <name>yarn.nodemanager.local-dirs</name>
              <value>/data/hadoop/yarn/local-dirs</value>
          </property>
          <property>
              <name>yarn.nodemanager.log-dirs</name>
              <value>/data/hadoop/yarn/log-dirs</value>
          </property>
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>/data/hadoop/yarn/log-dirs</value>
              <description>是否启用日志聚集功能</description>
          </property>
          <property>
              <name>yarn.log-aggregation.retain-seconds</name>
              <value>10080</value>
              <description>日志存储时间</description>
          </property>
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>true</value>
              <description>是否启用日志聚集功能</description>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir</name>
              <value>/yarn/app/logs</value>
          </property>
          <property>
              <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
              <value>logs</value>
          </property>
        </configuration>
    

    配置说明:

    yarn-site.xml:

    yarn.log-aggregation-enable: 设置为“true”,表示打开该功能,日志会被收集到HDFS目录中

  • resourcemanager

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: yarn-rm
      namespace: yarn
    spec:
      serviceName: "yarn-rm"
      replicas: 1
      selector:
        matchLabels:
          app: yarn-rm
      template:
        metadata:
          labels:
            app: yarn-rm
        spec:
          containers:
            - name: yarn-rm
              image: registry.cn-hangzhou.aliyuncs.com/davisgao/hadoop:3.2.4
              command: ["yarn"]
              args:
                - "--config"
                - "/usr/local/hadoop/etc/hadoop"
                - "resourcemanager"
              env:
              - name: HADOOP_LOG_DIR
                value: "/data/hadoop/yarn/logs"
              - name: HADOOP_CONF_DIR
                value: "/usr/local/hadoop/etc/hadoop"
              - name: SPARK_CONF_DIR
                value: "/usr/local/hadoop/etc/hadoop"
              ports:
              - containerPort: 8032
                name: rm
              - containerPort: 8088
                name: webui
              - containerPort: 8031
                name: tracker
              volumeMounts:
              - name: yarn-cm
                mountPath: /usr/local/hadoop/etc/hadoop/yarn-site.xml
                subPath: yarn-site.xml
              - name: yarn-cm
                mountPath: /usr/local/hadoop/etc/hadoop/core-site.xml
                subPath: core-site.xml
              - name: yarn-cm
                mountPath: /usr/local/hadoop/etc/hadoop/mapred-site.xml
                subPath: mapred-site.xml
              - name: yarn-local-dirs
                mountPath: /data/hadoop/yarn/local-dirs
              - name: yarn-log-dirs
                mountPath: /data/hadoop/yarn/log-dirs
              - name: hadoop-tmp
                mountPath: /data/hadoop/tmp
          volumes:
          - name: yarn-cm
            configMap:
              name: yarn-cm
              items:
              - key: yarn-site.xml
                path: yarn-site.xml
              - key: core-site.xml
                path: core-site.xml
              - key: mapred-site.xml
                path: mapred-site.xml
          - name: yarn-local-dirs
            hostPath:
              path: /data/hadoop/yarn/local-dirs
              type: Directory
          - name: yarn-log-dirs
            hostPath: 
              path: /data/hadoop/yarn/log-dirs
              type: Directory
          - name: hadoop-tmp
            hostPath: 
              path: /data/hadoop/tmp
              type: Directory
          nodeSelector:
            yarn-rm: "true"
          restartPolicy: Always
    

    resourcemanager建议是用单独盘

    mkdir -p /data/hadoop/yarn/local-dirs
    mkdir -p /data/hadoop/yarn/log-dirs
    mkdir -p /data/hadoop/tmp

    wget https://download.java.net/openjdk/jdk8u42/ri/openjdk-8u42-b03-linux-x64-14_jul_2022.tar.gz

  • nodemanager

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: yarn-nm
      namespace: yarn
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: yarn-nm
      template:
        metadata:
          labels:
            app: yarn-nm
        spec:
          containers:
            - name: yarn-nm
              image: registry.cn-hangzhou.aliyuncs.com/davisgao/hadoop:3.2.4
              command: ["yarn"]
              args:
                - "--config"
                - "/usr/local/hadoop/etc/hadoop"
                - "nodemanager"
              env:
              - name: HADOOP_LOG_DIR
                value: "/data/hadoop/yarn/logs"
              - name: HADOOP_CONF_DIR
                value: "/usr/local/hadoop/etc/hadoop"
              - name: SPARK_CONF_DIR
                value: "/usr/local/hadoop/etc/hadoop"
              ports:
              - containerPort: 8032
                name: nm
              - containerPort: 8088
                name: webui
              volumeMounts:
              - name: yarn-cm
                mountPath: /usr/local/hadoop/etc/hadoop/yarn-site.xml
                subPath: yarn-site.xml
              - name: yarn-cm
                mountPath: /usr/local/hadoop/etc/hadoop/core-site.xml
                subPath: core-site.xml
              - name: yarn-cm
                mountPath: /usr/local/hadoop/etc/hadoop/mapred-site.xml
                subPath: mapred-site.xml
              - name: yarn-local-dirs
                mountPath: /data/hadoop/yarn/local-dirs
              - name: yarn-log-dirs
                mountPath: /data/hadoop/yarn/log-dirs
          volumes:
          - name: yarn-cm
            configMap:
              name: yarn-cm
              items:
              - key: nm-yarn-site.xml
                path: yarn-site.xml
              - key: core-site.xml
                path: core-site.xml
              - key: mapred-site.xml
                path: mapred-site.xml
          - name: yarn-local-dirs
            hostPath:
              path: /data/hadoop/yarn/local-dirs
              type: Directory
          - name: yarn-log-dirs
            hostPath: 
              path: /data/hadoop/yarn/log-dirs
              type: Directory
          nodeSelector:
            yarn-nm: "true"
          restartPolicy: Always
    

    nodemanager建议是用单独盘

    mkdir -p /data/hadoop/yarn/local-dirs
    mkdir -p /data/hadoop/yarn/log-dirs

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,014评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,796评论 3 386
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,484评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,830评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,946评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,114评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,182评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,927评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,369评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,678评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,832评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,533评论 4 335
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,166评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,885评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,128评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,659评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,738评论 2 351

推荐阅读更多精彩内容