阅读笔记8:EmbDI

http://www.eurecom.fr/en/publication/6231/download/data-publi-6231.pdf

—————————————————————————————————————————

contributions:

1,describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in the relational world.

2,propose how to derive sentences from such a graph that effectively “describe" the similarity

across elements.

3,effective optimization to improve the quality of the learned embeddings and the performance of integration tasks.

4,propose a diverse collection of criteria.


steps:

1,定义3种node:row(r),attributions(A),token;边:把共现的连起来

2,random walk(随机游走)生成一些路径

3,From Walks to Sentences:

4,Embedding Construction:We piggyback on the plethora of effective embeddings algorithms such as word2vec, GloVe, fastText, and so on.


Optimization:IMPROVING LOCAL EMBEDDINGS

1,Handling Imbalanced Relations

2,Handling Missing and Noisy Data


Criteria:准备好所有node的embedding以后,就可以用于后续的任务

1,Schema Matching (SM)

    - 直接拿CID的embedding做KNN查找。


2,Entity Resolution (ER)

    - 直接拿RID的embedding做KNN查找。

3,Token Matching (TM).

    - - 直接拿TID的embedding做KNN查找。Top1就是 conceptual synonym。

Experiment

1,Evaluating Embeddings Quality

- MatchAttribute (MA),MatchRow(MR),MatchConcept (MC)


2,Data Integration Tasks

- Schema Matching(SM)


- Entity Resolution

    S/F/O:对muti-word的不同处理方式

    DeepERpl:用local embedding+deeper网络


ER,不同TOPn,有不同效果。

- Token Matching

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。