阿里云TimeStream系列--TimeStream数据模型

TimeStream为了简化ES用作时序引擎的使用，直接内置了时序的数据模型。

在时序场景下，数据包含了三种类型的数据：

维度字段
指标字段
时间字段

ES通过设置index.mode=time_series，可以自动为mapping设置时间字段。

然后通过time_series_dimension和time_series_metric可以为字段设置维度或者指标类型。

TimeStream利用了这些能力，直接定义好数据模型：

字段	描述
labels	指标相关的属性，唯一标识个体的元数据，时间线ID可由labels生成。
metrics	指标数据集合，指标只能为long或者double类型。
@timestamp	指标记录对应的时间，默认是毫秒级的时间戳。

当然，用户也可以自定义模型字段。如果不指定的话，就使用上述的默认字段。

所以TimeStream最简单的创建方式就直接使用如下API：

PUT _time_stream/test_stream

让我们来可以看看，默认的创建方式，TimeStream内部会做哪些操作。通过如下的GET命令可以查看到time_stream内部的一些配置信息：

GET _time_stream/test_stream

首先time_stream会自动创建一个索引模板，模板内容如下：

{
        "index_patterns" : [
          "test_stream"
        ],
        "template" : {
          "settings" : {
            "index" : {
              "mode" : "time_series",
              "codec" : "ali",
              "refresh_interval" : "10s",
              "ali_codec_service" : {
                "enabled" : "true",
                "source_reuse_doc_values" : {
                  "enabled" : "true"
                }
              },
              "translog" : {
                "durability" : "ASYNC"
              },
              "doc_value" : {
                "compression" : {
                  "default" : "zstd"
                }
              },
              "postings" : {
                "compression" : "zstd"
              },
              "source" : {
                "compression" : "zstd"
              },
              "routing_path" : [
                "labels.*"
              ]
            }
          },
          "mappings" : {
            "numeric_detection" : true,
            "dynamic_templates" : [
              {
                "labels_template_match_labels.*" : {
                  "path_match" : "labels.*",
                  "mapping" : {
                    "time_series_dimension" : "true",
                    "type" : "keyword"
                  },
                  "match_mapping_type" : "*"
                }
              },
              {
                "metrics_double_match_metrics.*" : {
                  "path_match" : "metrics.*",
                  "mapping" : {
                    "index" : "false",
                    "type" : "double"
                  },
                  "match_mapping_type" : "double"
                }
              },
              {
                "metrics_long_match_metrics.*" : {
                  "path_match" : "metrics.*",
                  "mapping" : {
                    "index" : "false",
                    "type" : "long"
                  },
                  "match_mapping_type" : "long"
                }
              }
            ],
            "properties" : {
              "@timestamp" : {
                "format" : "epoch_millis||strict_date_optional_time",
                "type" : "date"
              }
            }
          }
        },
        "composed_of" : [ ],
        "data_stream" : {
          "hidden" : false
        }
      }

index_template settings中配置了以下关键参数。

参数	说明
index.mode	取值time_series，表示创建的索引类型是time_series索引，系统会自动集成Elasticsearch在时序场景的最佳实践配置。
index.codec	取值ali，表示使用aliyun-codec索引压缩插件。与以下参数配合使用，可以极大减少磁盘存储空间： * index.ali_codec_service.enabled=true：开启codec压缩功能。 * index.doc_value.compression.default=zstd：doc_values使用zstd压缩。 * index.postings.compression=zstd：倒排数据使用zstd压缩。 * index.ali_codec_service.source_reuse_doc_values.enabled=true：不存储source，使用doc_values拼装source。 * index.source.compression=zstd：正排数据使用zstd压缩。

参数

说明

index.mode

取值time_series，表示创建的索引类型是time_series索引，系统会自动集成Elasticsearch在时序场景的最佳实践配置。

index.codec

取值ali，表示使用aliyun-codec索引压缩插件。与以下参数配合使用，可以极大减少磁盘存储空间：
* index.ali_codec_service.enabled=true：开启codec压缩功能。
* index.doc_value.compression.default=zstd：doc_values使用zstd压缩。
* index.postings.compression=zstd：倒排数据使用zstd压缩。
* index.ali_codec_service.source_reuse_doc_values.enabled=true：不存储source，使用doc_values拼装source。
* index.source.compression=zstd：正排数据使用zstd压缩。

index_template mappings配置了时序模型对应的dynamic_templates配置：

维度字段：默认使用keyword类型，然后配置time_series_dimension=true，标识为维度字段。index.mode=time_series会把所有time_series_dimension=true的字段拼装成一个时间线id（_tsid）的内部字段。
指标字段：支持double和long类型，只存储doc_values，不存储索引。

所以可以看到，一个默认的创建接口，TimeStream将数据模型和时序场景的最佳实践直接集成，极大的降低了用户的使用门槛。

写入数据时，按照数据模型写入示例如下：

POST test_stream/_doc
{
  "labels": {
    "namespce": "cn-hanzhou",
    "clusterId": "cn-xxx-xxxxxx",
    "nodeId": "node-xxx",
    "label": "test-cluster",
    "disk_type": "cloud_ssd",
    "cluster_type": "normal"
  },
  "metrics": {
    "cpu.idle": 10,
    "mem.free": 100.1,
    "disk_ioutil": 5.2
  },
  "@timestamp": 1624873606000
}

查询则可以使用ES原生search API，也可以使用Prometheus的查询接口，通过promQL查询，这个后面会介绍到。

阿里云TimeStream系列--TimeStream数据模型

阿里云TimeStream系列--TimeStream数据模型

相关阅读更多精彩内容

友情链接更多精彩内容