聊聊flink Table的Over Windows

本文主要研究一下flink Table的Over Windows

实例

Table table = input
  .window([OverWindow w].as("w"))           // define over window with alias w
  .select("a, b.sum over w, c.min over w"); // aggregate over the over window w
  • Over Windows类似SQL的over子句,它可以基于event-time、processing-time或者row-count;具体可以通过Over类来构造,其中必须设置orderBy、preceding及as方法;它有Unbounded及Bounded两大类

Unbounded Over Windows实例


// Unbounded Event-time over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("unbounded_range").as("w"));

// Unbounded Processing-time over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("unbounded_range").as("w"));

// Unbounded Event-time Row-count over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("unbounded_row").as("w"));
 
// Unbounded Processing-time Row-count over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("unbounded_row").as("w"));
  • 对于event-time及processing-time使用unbounded_range来表示Unbounded,对于row-count使用unbounded_row来表示Unbounded

Bounded Over Windows实例

// Bounded Event-time over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("1.minutes").as("w"))

// Bounded Processing-time over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("1.minutes").as("w"))

// Bounded Event-time Row-count over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("10.rows").as("w"))
 
// Bounded Processing-time Row-count over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("10.rows").as("w"))
  • 对于event-time及processing-time使用诸如1.minutes来表示Bounded,对于row-count使用诸如10.rows来表示Bounded

Table.window

flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/table/api/table.scala

class Table(
    private[flink] val tableEnv: TableEnvironment,
    private[flink] val logicalPlan: LogicalNode) {

  //......  

  @varargs
  def window(overWindows: OverWindow*): OverWindowedTable = {

    if (tableEnv.isInstanceOf[BatchTableEnvironment]) {
      throw new TableException("Over-windows for batch tables are currently not supported.")
    }

    if (overWindows.size != 1) {
      throw new TableException("Over-Windows are currently only supported single window.")
    }

    new OverWindowedTable(this, overWindows.toArray)
  }

  //......

}    
  • Table提供了OverWindow参数的window方法,用来进行Over Windows操作,它创建的是OverWindowedTable

OverWindow

flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/table/api/windows.scala

/**
  * Over window is similar to the traditional OVER SQL.
  */
case class OverWindow(
    private[flink] val alias: Expression,
    private[flink] val partitionBy: Seq[Expression],
    private[flink] val orderBy: Expression,
    private[flink] val preceding: Expression,
    private[flink] val following: Expression)
  • OverWindow定义了alias、partitionBy、orderBy、preceding、following属性

Over

flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/table/api/java/windows.scala

object Over {

  /**
    * Specifies the time attribute on which rows are grouped.
    *
    * For streaming tables call [[orderBy 'rowtime or orderBy 'proctime]] to specify time mode.
    *
    * For batch tables, refer to a timestamp or long attribute.
    */
  def orderBy(orderBy: String): OverWindowWithOrderBy = {
    val orderByExpr = ExpressionParser.parseExpression(orderBy)
    new OverWindowWithOrderBy(Array[Expression](), orderByExpr)
  }

  /**
    * Partitions the elements on some partition keys.
    *
    * @param partitionBy some partition keys.
    * @return A partitionedOver instance that only contains the orderBy method.
    */
  def partitionBy(partitionBy: String): PartitionedOver = {
    val partitionByExpr = ExpressionParser.parseExpressionList(partitionBy).toArray
    new PartitionedOver(partitionByExpr)
  }
}

class OverWindowWithOrderBy(
  private val partitionByExpr: Array[Expression],
  private val orderByExpr: Expression) {

  /**
    * Set the preceding offset (based on time or row-count intervals) for over window.
    *
    * @param preceding preceding offset relative to the current row.
    * @return this over window
    */
  def preceding(preceding: String): OverWindowWithPreceding = {
    val precedingExpr = ExpressionParser.parseExpression(preceding)
    new OverWindowWithPreceding(partitionByExpr, orderByExpr, precedingExpr)
  }

}

class PartitionedOver(private val partitionByExpr: Array[Expression]) {

  /**
    * Specifies the time attribute on which rows are grouped.
    *
    * For streaming tables call [[orderBy 'rowtime or orderBy 'proctime]] to specify time mode.
    *
    * For batch tables, refer to a timestamp or long attribute.
    */
  def orderBy(orderBy: String): OverWindowWithOrderBy = {
    val orderByExpr = ExpressionParser.parseExpression(orderBy)
    new OverWindowWithOrderBy(partitionByExpr, orderByExpr)
  }
}

class OverWindowWithPreceding(
    private val partitionBy: Seq[Expression],
    private val orderBy: Expression,
    private val preceding: Expression) {

  private[flink] var following: Expression = _

  /**
    * Assigns an alias for this window that the following `select()` clause can refer to.
    *
    * @param alias alias for this over window
    * @return over window
    */
  def as(alias: String): OverWindow = as(ExpressionParser.parseExpression(alias))

  /**
    * Assigns an alias for this window that the following `select()` clause can refer to.
    *
    * @param alias alias for this over window
    * @return over window
    */
  def as(alias: Expression): OverWindow = {

    // set following to CURRENT_ROW / CURRENT_RANGE if not defined
    if (null == following) {
      if (preceding.resultType.isInstanceOf[RowIntervalTypeInfo]) {
        following = CURRENT_ROW
      } else {
        following = CURRENT_RANGE
      }
    }
    OverWindow(alias, partitionBy, orderBy, preceding, following)
  }

  /**
    * Set the following offset (based on time or row-count intervals) for over window.
    *
    * @param following following offset that relative to the current row.
    * @return this over window
    */
  def following(following: String): OverWindowWithPreceding = {
    this.following(ExpressionParser.parseExpression(following))
  }

  /**
    * Set the following offset (based on time or row-count intervals) for over window.
    *
    * @param following following offset that relative to the current row.
    * @return this over window
    */
  def following(following: Expression): OverWindowWithPreceding = {
    this.following = following
    this
  }
}
  • Over类是创建over window的帮助类,它提供了orderBy及partitionBy两个方法,分别创建的是OverWindowWithOrderBy及PartitionedOver
  • PartitionedOver提供了orderBy方法,创建的是OverWindowWithOrderBy;OverWindowWithOrderBy提供了preceding方法,创建的是OverWindowWithPreceding
  • OverWindowWithPreceding则包含了partitionBy、orderBy、preceding属性,它提供了as方法创建OverWindow,另外还提供了following方法用于设置following offset

OverWindowedTable

flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/table/api/table.scala

class OverWindowedTable(
    private[flink] val table: Table,
    private[flink] val overWindows: Array[OverWindow]) {

  def select(fields: Expression*): Table = {
    val expandedFields = expandProjectList(
      fields,
      table.logicalPlan,
      table.tableEnv)

    if(fields.exists(_.isInstanceOf[WindowProperty])){
      throw new ValidationException(
        "Window start and end properties are not available for Over windows.")
    }

    val expandedOverFields = resolveOverWindows(expandedFields, overWindows, table.tableEnv)

    new Table(
      table.tableEnv,
      Project(
        expandedOverFields.map(UnresolvedAlias),
        table.logicalPlan,
        // required for proper projection push down
        explicitAlias = true)
        .validate(table.tableEnv)
    )
  }

  def select(fields: String): Table = {
    val fieldExprs = ExpressionParser.parseExpressionList(fields)
    //get the correct expression for AggFunctionCall
    val withResolvedAggFunctionCall = fieldExprs.map(replaceAggFunctionCall(_, table.tableEnv))
    select(withResolvedAggFunctionCall: _*)
  }
}
  • OverWindowedTable构造器需要overWindows参数;它只提供select操作,其中select可以接收String类型的参数,也可以接收Expression类型的参数;String类型的参数会被转换为Expression类型,最后调用的是Expression类型参数的select方法;select方法创建了新的Table,其Project的projectList为expandedOverFields.map(UnresolvedAlias),而expandedOverFields则通过resolveOverWindows(expandedFields, overWindows, table.tableEnv)得到

小结

  • Over Windows类似SQL的over子句,它可以基于event-time、processing-time或者row-count;具体可以通过Over类来构造,其中必须设置orderBy、preceding及as方法;它有Unbounded及Bounded两大类(对于event-time及processing-time使用unbounded_range来表示Unbounded,对于row-count使用unbounded_row来表示Unbounded;对于event-time及processing-time使用诸如1.minutes来表示Bounded,对于row-count使用诸如10.rows来表示Bounded)
  • Table提供了OverWindow参数的window方法,用来进行Over Windows操作,它创建的是OverWindowedTable;OverWindow定义了alias、partitionBy、orderBy、preceding、following属性;Over类是创建over window的帮助类,它提供了orderBy及partitionBy两个方法,分别创建的是OverWindowWithOrderBy及PartitionedOver,而PartitionedOver提供了orderBy方法,创建的是OverWindowWithOrderBy;OverWindowWithOrderBy提供了preceding方法,创建的是OverWindowWithPreceding;OverWindowWithPreceding则包含了partitionBy、orderBy、preceding属性,它提供了as方法创建OverWindow,另外还提供了following方法用于设置following offset
  • OverWindowedTable构造器需要overWindows参数;它只提供select操作,其中select可以接收String类型的参数,也可以接收Expression类型的参数;String类型的参数会被转换为Expression类型,最后调用的是Expression类型参数的select方法;select方法创建了新的Table,其Project的projectList为expandedOverFields.map(UnresolvedAlias),而expandedOverFields则通过resolveOverWindows(expandedFields, overWindows, table.tableEnv)得到

doc

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,258评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,335评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,225评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,126评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,140评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,098评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,018评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,857评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,298评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,518评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,678评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,400评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,993评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,638评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,801评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,661评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,558评论 2 352

推荐阅读更多精彩内容