ADF
How to Remove Duplicate Records in Azure Data factory | ADF Interview Questions & Answers
1. What are types of integration runtime in Azure?
IR:IR is the compute infrastructure
(1)Azure IR:fully managed in Azure cloud
(2)Self-hosted IR:If you want to perform data integration on a private network enviroment that doesn' t have a direct line-of-sight to the cloud enviroment, you can use self-hosed IR。On-Prem to Azure
(3)Azure-SSIS IR:run your SSIS packages with Azure SSIS IR
2. SSIS
需要学
3. private end point
有什么用,怎么部署
4.
删除file,json 等等文件的重复值
- 使用pipeline的方法
- 使用data flow的方法
18. how to remove duplicate records in azure
使用data flow删除重复
该方法基于sql的删除查询
WITH DP AS( SELECT *,
ROW_NUMBER() over(partition by name,dep_id order by name) as rn FROM emp_duplicate)
SELECT * FROM DP where rn >1
-
方法一:(推荐)直接基于window function和sql一致
- 这里over 等于paitition by;sort 等于 order by ; window columns 就是over之前的
- 方法二:理解
$$
和first($$)
①使用sort 排序,对name进行排序
②使用aggregate聚合方法,先对重复标志列,进行分组 name 和dep_id;在aggregate里,通匹排除 列名是name和dep_id,将每个group by的第一个结果输出
注意:这里的可以写成`!in(['name','dep_id'],name)
③使用sink保存数据
19. Which Activity Will You Use to Delete All Files In Azure Data Factory
直接使用delete 方法
20. What are Event Based Triggers in Azure Data Factory
可以使用改 eventa based triggers 实现blob有文件进入到指定的blolb里出发pipeline;例如,当test.txt进入到container raw里的时候,执行pl_execute_sdc