Data Flow Transformations(2):sort,aggregate

Sort

  • Sorts input data in ascending or descending order.

  • Multiple sorts can be applied; each sort is identified by a numeral that determines the sort order.

  • Positive number denotes ascending order, and a negative number denotes descending order.

  • Column with the lowest number is sorted first, second lowest number is sorted next, and so on.

  • It can also remove duplicate rows as part of its sort.

  • It has one input and one output. It does not support error outputs.

  • full blocking


    image.png

how to sort the output
change the source output, false to true


image.png

image.png

Aggregate

  • Used to perform following operations on input columns:
    ▶Sum, Average, Count, Count Distinct, Minimum, Maximum
    ▶Group By
    ▶The comparison options of the aggregation

  • Handles null values in the same way as the SSMS does.

  • IsBig property can be set for handling big or high-precision numbers.

  • Performance Can be Improved by:
    ▶Set Keys or KeysScale properties When performing a Group by operation
    ▶Set CountDistinctKeys or CountDistinctScale properties while performing Distinct count operation.
    ▶because the tranformation is able to allocate adequate memory for the data that the transformation caches.

  • The Aggregate transformation has one input and one or more outputs. It does not support an error output.

  • full blocking

count distinct
ask more space from ram???

isbig

Union All

  • Combines multiple inputs into one output.
  • Inputs are added to the transformation output one after the other; no reordering of rows occurs.
  • At least one input must be mapped to each output column.
  • Metadata of the columns must match. dayatype must be matched
  • Columns that are not mapped, are set to null values in output.
  • This transformation has multiple inputs and one output. It does not support an error output.
  • semi-blocking

Merge

  • Combines two sorted datasets into a single dataset.
  • Merging is based on values in their key columns.
  • It can be used to perform:
    ▶Merge data from two data sources.
    ▶Create complex datasets by nesting Merge transformations.
    ▶Remerge rows after correcting errors in the data.
  • Inputs have matching metadata.
  • This transformation has two inputs and one output. It does not support an error output.
  • semi blocking

Merge Join

  • Combines two sorted datasets into a single dataset by using a FULL, LEFT, or INNER join.
  • It can configured as follows:
    ▶Specify the join: FULL, LEFT, or INNER join.
    ▶Specify the columns the join uses.
    ▶Specify whether the transformation handles null values as equal to other nulls.
  • Joining columns have matching metadata.
  • This transformation has two inputs and one output. It does not support an error output.
  • semi blocking
  • swap input

sorting in the property, it can improve performance(?)

Conditional Split

  • It can route data rows to different outputs depending on the content of the data.

  • It evaluates expressions, and based on the results, directs the data row to the specified output.

  • It can configured as follows:
    ▶Specify an expression that evaluates to a Boolean for each condition.
    ▶Specify the order in which the conditions are evaluated.
    ▶Specify the default output for the transformation.

  • This transformation has one input, one or more outputs and one error output.

  • like case statement in sql

  • non-blocking

  • 注意大小写

eg: [id]>1 && [id]<5

Multicast

  • Multicast transformation distributes its input to one or more outputs.
  • Multicast transformation directs every row to every output, unlike Conditional Split directs a row to a single output.
  • You configure the Multicast transformation by adding outputs.
  • This transformation has one input and multiple outputs. It does not support an error output.
  • non- blocking
  • like copy the dataset

区别: conditional的是符合条件的去,multicast是复制粘贴,不管条件

Row Count

  • Count the number of rows, as they pass through a data flow and stores the final count in a variable.
  • Variable used Must already exist.
  • Variable must be in the scope of Data Flow Task using Row Count.
  • Row count value is stored in the variable only after the last row has passed the transformation.
  • This transformation has one input and one output. It does not support an error output
  • non-blocking
  • https://www.c-sharpcorner.com/UploadFile/muralidharan.d/how-to-use-rowcount-in-ssis/

Audit

  • Audit transformation enables the data flow in a package to include data about the environment in which the package runs.
  • SSIS includes system variables that provide this information.
  • It can be configured as:
    ▶Provide the name of a output column.
    ▶Map the system variable to the output column.
  • Single system variable can be mapped to multiple columns.
  • This transformation has one input and one output. It does not support an error output.
  • non blocking

how to show the num

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,871评论 0 10
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 9,929评论 0 13
  • 今天晚上给烁烁画了一个蛋壳奥特曼,他很高兴,和我一起玩起了照相游戏,今天晚上写作业也写的很好,读《大学》一口气就读...
    885d352dfbfc阅读 240评论 0 1
  • 一年又一年 坐在图书馆里面透过玻璃看到了上一届的学生穿着学士服在拍毕业照 想想去年的我们拍毕业照不过是昨天转眼间的...
    乐水遥阅读 247评论 0 0
  • 红红鲜花深深爱,浓浓祝福最愉快。 美好愿望在心中,百年好合真期待。
    蛮力阅读 609评论 7 10

友情链接更多精彩内容