【Argo Workflow】五、Artifacts-给你的Workflow添加文件输入或输出文件

Artifact 支持的类型

1. S3（Amazon Simple Storage Service）

s3 协议是 amazon 开发的一种面向数据存储的网络协议，也是开源社区存储中间件常使用的一种协议。已知的很多开源中间件都支持此协议如：minio，ceph等，另外几乎所有大厂的数据服务也都声明支持此协议。

按照官网的案例yaml如下：

需要注意的是，在使用minio之前，你需要创建minio集群，并且在集群内可以访问到。
如果需要访问minio你还需要创建一个k8s secret，如果只想简单测试，可以使用如下命令创建secret
kubectl create secret generic my-s3-credentials --from-literal=accessKey=<YOUR-ACCESS-KEY> --from-literal=secretKey=<YOUR-SECRET-KEY>

metadata:
  namespace: argo
  generateName: input-artifact-s3-
spec:
  entrypoint: input-artifact-s3-example
  templates:
  - name: input-artifact-s3-example
    inputs:
      artifacts:
      - name: my-art
        path: /my-artifact/argo.yaml
        s3:
          endpoint: minio-svc.argo:9000
          insecure: true
          bucket: test
          key: argo.yaml
          region: us-west-2
          accessKeySecret:
            name: my-s3-credentials
            key: accessKey
          secretKeySecret:
            name: my-s3-credentials
            key: secretKey
    container:
      image: debian:latest
      command: [sh, -c]
      args: ["cat /my-artifact/argo.yaml"]

执行完毕之后，查看日志可以看到，我们从minio上下载到的文件使用cat命令打印出来了。

W
time="2022-07-26T13:05:42.551Z" level=info msg="capturing logs" argo=true
# This is an auto-generated file. DO NOT EDIT
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: clusterworkflowtemplates.argoproj.io
spec:
  group: argoproj.io
  names:
    kind: ClusterWorkflowTemplate
    listKind: ClusterWorkflowTemplateList
    plural: clusterworkflowtemplates
    shortNames:
    - clusterwftmpl
    - cwft
    singular: clusterworkflowtemplate
  scope: Cluster
  versions:
   ···

2. OSS

OSS(Object Storage Service)即对象存储服务，国内我了解到的aliyun默认在使用这种方案。

3. HDFS

知名大数据组件Hadoop File System即HDFS，设计为块存储模式，适合大文件的存储。

4. GIT

可以使用GIT协议拉取仓库中的代码进行CI/CD，这也是Argo目前使用比较广泛的一个场景。

5. HTTP

Argo也支持使用HTTP协议在网络上下载所有公开的文件。

这里的文件是kubectl的cli程序，目的源是google的网站，国内的话有可能没有办法下载，如果没有办法下载，你可以重写一下脚本，从你可以访问的地址下载一个文本文档，并使用cat命令来验证你确实把这个文件下载下来了。

metadata:
  generateName: arguments-artifacts-
spec:
  entrypoint: kubectl-input-artifact
  arguments:
    artifacts:
    - name: kubectl
      http:
        url: https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl

  templates:
  - name: kubectl-input-artifact
    inputs:
      artifacts:
      - name: kubectl
        path: /usr/local/bin/kubectl
        mode: 0755
    container:
      image: debian:9.4
      command: [sh, -c]
      args: ["kubectl version"]

6. GCS

GCS 即谷歌的文件存储服务。可以使用此协议下载google cloud上存储的文件，这里条件有限，不对这类文件类型测试，有需要的可以自行测试。

7. Azure

支持从微软云上下载你所需要的资源文件，这里条件有限，不对这类文件类型测试，有需要的可以自行测试。

使用案例

1. Step之间的文件交换

这里使用的是官网提供的案例，yaml如下：

metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: whalesay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          # bind message to the hello-art artifact
          # generated by the generate-artifact step
          - name: message
            from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      # generate hello-art artifact from /tmp/hello_world.txt
      # artifacts can be directories as well as files
      - name: hello-art
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      # unpack the message input artifact
      # and put it at /tmp/message
      - name: message
        path: /tmp/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/message"]

2. 给你的workflow添加Artifact参数

我们之前的章节有提到过，Inputs和Arguments对象所包含的值都是类似的，都有parameters和artifacts两个数组。这意味着你可以在Workflow的arguments字段也添加Artifacts的对象值。

以下是从官网中找到的案例yaml文件：

这里是使用http协议从网络上下载的文件，你也可以使用其他的协议。

metadata:
  generateName: arguments-artifacts-
spec:
  entrypoint: kubectl-input-artifact
  arguments:
    artifacts:
    - name: kubectl
      http:
        url: https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl

  templates:
  - name: kubectl-input-artifact
    inputs:
      artifacts:
      - name: kubectl
        path: /usr/local/bin/kubectl
        mode: 0755
    container:
      image: debian:9.4
      command: [sh, -c]
      args: ["kubectl version"]

3. 给你的workflow添加输出文件

如果说你想要输出参数，你可以在templates的节点上添加输出的文件，并没有类似Arguments的全局输出文件。

在Template中的outputs中定义你需要输出的文件，你也可以使用上面提到的各种协议，来定义你的文件需要输出到哪里。

下面是来自官网参考的yaml文件：

metadata:
  generateName: output-artifact-s3-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      - name: message
        path: /tmp
        # It is possible to disable tar.gz archiving by setting the archive strategy to 'none'
        # Disabling archiving has the following limitations on S3: symbolic links will not be
        # uploaded, as S3 does not support the concept/file mode of symlinks.
        # archive:
        #   none: {}

        s3:
          # Use the corresponding endpoint depending on your S3 provider:
          #   AWS: s3.amazonaws.com
          #   GCS: storage.googleapis.com
          #   Minio: my-minio-endpoint.default:9000
          endpoint: s3.amazonaws.com
          bucket: my-bucket
          # Specify the bucket region. Note that if you want Argo to figure out this automatically,
          # you can set additional statement policy that allows `s3:GetBucketLocation` action.
          # For details, check out: https://argoproj.github.io/argo-workflows/configure-artifact-repository/#configuring-aws-s3
          region: us-west-2

          # NOTE: by default, output artifacts are automatically tarred and gzipped before saving.
          # As a best practice, .tgz or .tar.gz should be suffixed into the key name so the
          # resulting object has an accurate file extension and mime-type. If archive is set to
          # 'none', then preserve the appropriate file extension for the key name
          key: path/in/bucket/hello_world.txt.tgz

          # accessKeySecret and secretKeySecret are secret selectors. It references the k8s secret
          # named 'my-s3-credentials'. This secret is expected to have have the keys 'accessKey'
          # and 'secretKey', containing the base64 encoded credentials to the bucket.
          accessKeySecret:
            name: my-s3-credentials
            key: accessKey
          secretKeySecret:
            name: my-s3-credentials
            key: secretKey