自我中心网络-Egonetwork

一、ego network的概念和定义

当不研究网络的整体,而是侧重于研究单个节点的性质,就会用到ego network。ego network是整体网络结构的一部分,是给定某节点,通过广度优先搜索遍历搜索出的网络结构。

ego Network: 又称自我中心网络,网络节点由唯一的一个中心节点(ego),以及这个节点的邻居(alters)组成,边只包括ego与alter之间,以及alter与alter之间的边。

以社交网络为例,针对某个人而言,将此人以及其朋友看作是节点,只考虑这个人和他朋友,以及他朋友之间的连边,就可以得到一个以A为中心的网络,即自我中心网络。如下图:


图1. ego示例

图中每个alter和它自身的邻居构成一个ego network,所有节点的ego network合并起来,就可以组成真实的的social network了。假设全图如下:


图2. 全图示例

以节点1为中心的ego network为:
图3. 以节点1为中心的ego network

三、ego network的提取特征

基于ego network可以提取与业务相关的网络属性特征。表格来源于文献:Thomas Zimmermann et al. Predicting Defects using Network Analysis on Dependency Graphs;


图4. "Predicting defects using network analysis on dependency graphs"

图5. Ego network相关网络特征属性
  • Size of ego network: the number of nodes that one-step out neighbors of ego, plus ego itself.
  • Number of directed ties: the number of connections among all the nodes in the ego network;
  • Number of ordered pairs: the number of possible directed ties in each ego network;
  • Density: the number of ties divided by the number of pair;
  • Average geodesic distance : the mean of the shortest path lengths among all connected pairs in the ego network.;
  • Diameter of an ego network: the length of the longest path between connected actors (just as it is for any network).
  • Number of weak components.: A weak component is the largest number of actors who are connected, disregarding the direction of the ties ; If ego was connected to A and B (who are connected to one another), and ego is connected to C and D (who are connected to one another), but A and B are not connected in any way to C and D (except by way of everyone being connected to ego) then there would be two "weak components" in ego's neighborhood.
  • Number of weak components divided by size: normalize the count of components by size.
  • Two-step reach: goes beyond ego's one-step neighborhood to report the percentage of all actors in the whole network that are within two directed steps of ego. (2-hop connection);
  • Reach efficiency (two-step reach divided by size) : norms the two-step reach by dividing it by size.
  • Brokerage (number of pairs not directly connected): The idea of brokerage (more on this, below) is that ego is the "go-between" for pairs of other actors. In an ego network, ego is connected to every other actor (by definition). If these others are not connected directly to one another, ego may be a "broker" ego falls on a the paths between the others.
  • Normalized brokerage (brokerage divided by number of pairs) : assesses the extent to which ego's role is that of broker.
  • Betweenness: ego is "between" two other actors if ego lies on the shortest directed path from one to the other. The ego betweenness measure indexes the percentage of all geodesic paths from neighbor to neighbor that pass through ego.
  • Normalized Betweenness : compares the actual betweenness of ego to the maximum possible betweenness in neighborhood of the size and connectivity of ego's. The "maximum" value for betweenness would be achieved where ego is the center of a "star" network;
    -EI-index: Ego-Alter Homophily;

四、如何实现ego network的提取

随机选取图节点中任意一个作为种子节点,采用广度优先搜索的方式(走有限步数)获得该节点的Ego network。该部分内容将展示如何用R语言实现ego network的提取和指标分析。主要涉及的R包为igaph和egor包

1. igraph包生成ego network

使用方法:

ego_size(graph, order = 1, nodes = V(graph), mode = c("all", "out",
"in"), mindist = 0)
ego(graph, order = 1, nodes = V(graph), mode = c("all", "out", "in"),
mindist = 0)
make_ego_graph(graph, order = 1, nodes = V(graph), mode = c("all",
"out", "in"), mindist = 0)

参数说明:

graph:The input graph.
order: Integer giving the order of the neighborhood.
nodes: The vertices for which the calculation is performed.
mode: Character constant, it specifies how to use the direction of the edges if a directed graph is analyzed. For ‘out’ only the outgoing edges are followed, so all vertices reachable from the source vertex in at most order steps are counted. For ‘"in"’ all vertices from which the source vertex is reachable in at most order steps are counted. ‘"all"’ ignores the direction of the edges. This argument is ignored for undirected graphs.
mindist: The minimum distance to include the vertex in the result.

require(igraph)
g <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
                    F-G-H-I-F, J-F:G:H:I,
                    K-L-M-N-K, O-K:L:M:N,
                    P-Q-R-S-P, T-P:Q:R:S,
                    B-F, E-J, C-I, L-T, O-T, M-S,
                    C-P, C-L, I-L, I-P)
ego_size(g, order = 2, 1:3)  #节点1-3, order(连接度)为2的邻点数量
ego(g, order = 2, 1:3)  #节点1-3, order(连接度)为2的邻点列表
make_ego_graph(g, order = 2, 1:3) #节点1-3, order(连接度)为2的连接情况

输出显示:

> ego_size(g, order = 2, 1:3)
[1]  7 11 17
> ego(g, order = 2, 1:3)
[[1]]
+ 7/20 vertices, named, from 3e32cac:
[1] A B D E C F J

[[2]]
+ 11/20 vertices, named, from 3e32cac:
 [1] B A C E F D I L P J G

[[3]]
+ 17/20 vertices, named, from 3e32cac:
 [1] C B D E I L P A F J H K M O T Q S
> make_ego_graph(g, order = 2, 1:3)
[[1]]
IGRAPH b0112be UN-- 7 11 -- 
+ attr: name (v/c)
+ edges from b0112be (vertex names):
 [1] A--B B--C A--D C--D A--E B--E C--E D--E B--F E--J F--J

[[2]]
IGRAPH 0f2cef6 UN-- 11 20 -- 
+ attr: name (v/c)
+ edges from 0f2cef6 (vertex names):
 [1] A--B A--D A--E B--C B--E B--F C--D C--E C--I C--L C--P D--E E--J
[14] F--G F--I F--J G--J I--J I--L I--P

[[3]]
IGRAPH 65a3787 UN-- 17 33 -- 
+ attr: name (v/c)
+ edges from 65a3787 (vertex names):
 [1] A--B A--D A--E B--C B--E B--F C--D C--E C--I C--L C--P D--E E--J
[14] F--I F--J H--I H--J I--J I--L I--P K--L K--O L--M L--O L--T M--O
[27] M--S O--T P--Q P--S P--T Q--T S--T

2. igraph+egor

样本数据表格式:


alter 数据

ego 数据表

alter-alter数据
require(egor)
#加载数据
data("alters32") ## alters in ego net
data("egos32") ## ego list in ego net
data("edges32") ## alter-later ties (edges) in ego net
#生成egor object
e1 <- egor(alters.df = alters32,
           egos.df = egos32,
           aaties = edges32,
           ID.vars = list(
             ego = "egoID",
             alter = "alterID",
             source = "Source",
             target = "Target"))
e1
#可以用data frame的操作方式对e1进行查看
e1[e1$netsize > 15, ]
subset(e1, .alts$alter.sex=="w", unit="alter")
subset(e1, .aaties$weight > 1, unit="aatie")
summary(e1)
#计算density
ego_density(e1)
##可视化
data("egor32")
# Simplify networks to clustered graphs, stored as igraph objects
graphs <- clustered_graphs(egor32, "age") 
# Visualize
par(mar=c(0,0,0,0))
vis_clustered_graphs(graphs[1:3], 
                     node.size.multiplier = 10, 
                     edge.width.multiplier = 5,
                     label.size = 0.6)

graphs2 <- clustered_graphs(make_egor(400, 200)[1:4], "country") 

vis_clustered_graphs(graphs2[1:4], 
                     node.size.multiplier = 2, 
                     edge.width.multiplier = 15,
                     label.size = 0.6,
                     labels = TRUE)
#igraph & network plotting
#as_igraph() converts an egor object to a list of igraph objects.
#as_network() converts an egor object to a list of network objects.

par(mar=c(0,0,0,0))
purrr::walk(as_igraph(egor32)[1:4], plot)
purrr::walk(as_network(egor32)[1:4], plot)
#Shiny App for Visualization: egor_vis_app() starts a Shiny app that allows offers a graphical interface for adjusting the visualization parameters of the networks stored in an egor object.
egor_vis_app(egor32)

五、应用案例: 识别传销网络

通过分析用户电信网络通话记录,处理成无向图,用户可以分为普通用户、某大型企业员工、服务帐号、传销组织人员,提取不同类别用户的ego network。

图6. 某普通用户的ego network

图7. 某服务账号的ego network

图8. 某企业领导的ego network (局限在本企业员工)

图9. 某非法传销组织头目的ego network

特点
(1) 节点之间不跨级联系;
(2)同级不抢占下线;
(3)同级非同上下线节点不联系
网络属性特征提取
(1)非父子的连边数;
(2)同层同一父节点的连边边数;
(3)同层不同父节点的连边的边数;
(4)入度(in-degree):以某类顶点为目标节点,终止于该顶点的边的数目为该顶点的入度;
(5)出度(out-degree):以某类顶点为目标节点,起始于该顶点的边的数目为该顶点的出度。
示例如下:目标用户群指传销网络用户
图10. 出度在不同网络中分布情况示例

图11. 入度在不同网络中分布情况示例

五、 ego network相关的数据集

下载数据来玩一玩(TBD)
1.【Facebook ego network】:http://www.pkbigdata.com/common/share/53.html
2.【Twitter ego network】:http://www.pkbigdata.com/common/share/54.html
3.【Google+ ego network】:http://www.pkbigdata.com/common/share/56.html
4.【Communication ego network】:http://www.pkbigdata.com/common/share/57.html
5.【Employee ego network】:http://www.pkbigdata.com/common/share/59.html
6.【HighSchoolContact ego network】:http://www.pkbigdata.com/common/share/60.html
7.【Communication-1 ego network】:http://www.pkbigdata.com/common/share/61.html

参考资料

[1] 自我中心网络:科普+数据共享 : http://blog.sciencenet.cn/blog-3075-1072943.html ;
[2] Introduction to social network methods: http://faculty.ucr.edu/~hanneman/nettext/C9_Ego_networks.html
[3] 如何利用大数据发现非法传销网络?: https://zhuanlan.zhihu.com/p/28838581;
[4] Thomas Zimmermann et al. Predicting Defects using Network Analysis on Dependency Graphs;
[5] Using egor to analyse ego-centered network data: https://cran.r-project.org/web/packages/egor/vignettes/using_egor.html ;
[6] R graph manual pages: https://igraph.org/r/doc/ego.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,142评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,298评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,068评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,081评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,099评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,071评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,990评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,832评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,274评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,488评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,649评论 1 347
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,378评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,979评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,625评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,796评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,643评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,545评论 2 352

推荐阅读更多精彩内容