一、ego network的概念和定义
当不研究网络的整体,而是侧重于研究单个节点的性质,就会用到ego network。ego network是整体网络结构的一部分,是给定某节点,通过广度优先搜索遍历搜索出的网络结构。
ego Network: 又称自我中心网络,网络节点由唯一的一个中心节点(ego),以及这个节点的邻居(alters)组成,边只包括ego与alter之间,以及alter与alter之间的边。
以社交网络为例,针对某个人而言,将此人以及其朋友看作是节点,只考虑这个人和他朋友,以及他朋友之间的连边,就可以得到一个以A为中心的网络,即自我中心网络。如下图:
图中每个alter和它自身的邻居构成一个ego network,所有节点的ego network合并起来,就可以组成真实的的social network了。假设全图如下:
以节点1为中心的ego network为:
三、ego network的提取特征
基于ego network可以提取与业务相关的网络属性特征。表格来源于文献:Thomas Zimmermann et al. Predicting Defects using Network Analysis on Dependency Graphs;
- Size of ego network: the number of nodes that one-step out neighbors of ego, plus ego itself.
- Number of directed ties: the number of connections among all the nodes in the ego network;
- Number of ordered pairs: the number of possible directed ties in each ego network;
- Density: the number of ties divided by the number of pair;
- Average geodesic distance : the mean of the shortest path lengths among all connected pairs in the ego network.;
- Diameter of an ego network: the length of the longest path between connected actors (just as it is for any network).
- Number of weak components.: A weak component is the largest number of actors who are connected, disregarding the direction of the ties ; If ego was connected to A and B (who are connected to one another), and ego is connected to C and D (who are connected to one another), but A and B are not connected in any way to C and D (except by way of everyone being connected to ego) then there would be two "weak components" in ego's neighborhood.
- Number of weak components divided by size: normalize the count of components by size.
- Two-step reach: goes beyond ego's one-step neighborhood to report the percentage of all actors in the whole network that are within two directed steps of ego. (2-hop connection);
- Reach efficiency (two-step reach divided by size) : norms the two-step reach by dividing it by size.
- Brokerage (number of pairs not directly connected): The idea of brokerage (more on this, below) is that ego is the "go-between" for pairs of other actors. In an ego network, ego is connected to every other actor (by definition). If these others are not connected directly to one another, ego may be a "broker" ego falls on a the paths between the others.
- Normalized brokerage (brokerage divided by number of pairs) : assesses the extent to which ego's role is that of broker.
- Betweenness: ego is "between" two other actors if ego lies on the shortest directed path from one to the other. The ego betweenness measure indexes the percentage of all geodesic paths from neighbor to neighbor that pass through ego.
-
Normalized Betweenness : compares the actual betweenness of ego to the maximum possible betweenness in neighborhood of the size and connectivity of ego's. The "maximum" value for betweenness would be achieved where ego is the center of a "star" network;
-EI-index: Ego-Alter Homophily;
四、如何实现ego network的提取
随机选取图节点中任意一个作为种子节点,采用广度优先搜索的方式(走有限步数)获得该节点的Ego network。该部分内容将展示如何用R语言实现ego network的提取和指标分析。主要涉及的R包为igaph和egor包
1. igraph包生成ego network
使用方法:
ego_size(graph, order = 1, nodes = V(graph), mode = c("all", "out",
"in"), mindist = 0)
ego(graph, order = 1, nodes = V(graph), mode = c("all", "out", "in"),
mindist = 0)
make_ego_graph(graph, order = 1, nodes = V(graph), mode = c("all",
"out", "in"), mindist = 0)
参数说明:
graph:The input graph.
order: Integer giving the order of the neighborhood.
nodes: The vertices for which the calculation is performed.
mode: Character constant, it specifies how to use the direction of the edges if a directed graph is analyzed. For ‘out’ only the outgoing edges are followed, so all vertices reachable from the source vertex in at most order steps are counted. For ‘"in"’ all vertices from which the source vertex is reachable in at most order steps are counted. ‘"all"’ ignores the direction of the edges. This argument is ignored for undirected graphs.
mindist: The minimum distance to include the vertex in the result.
require(igraph)
g <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
F-G-H-I-F, J-F:G:H:I,
K-L-M-N-K, O-K:L:M:N,
P-Q-R-S-P, T-P:Q:R:S,
B-F, E-J, C-I, L-T, O-T, M-S,
C-P, C-L, I-L, I-P)
ego_size(g, order = 2, 1:3) #节点1-3, order(连接度)为2的邻点数量
ego(g, order = 2, 1:3) #节点1-3, order(连接度)为2的邻点列表
make_ego_graph(g, order = 2, 1:3) #节点1-3, order(连接度)为2的连接情况
输出显示:
> ego_size(g, order = 2, 1:3)
[1] 7 11 17
> ego(g, order = 2, 1:3)
[[1]]
+ 7/20 vertices, named, from 3e32cac:
[1] A B D E C F J
[[2]]
+ 11/20 vertices, named, from 3e32cac:
[1] B A C E F D I L P J G
[[3]]
+ 17/20 vertices, named, from 3e32cac:
[1] C B D E I L P A F J H K M O T Q S
> make_ego_graph(g, order = 2, 1:3)
[[1]]
IGRAPH b0112be UN-- 7 11 --
+ attr: name (v/c)
+ edges from b0112be (vertex names):
[1] A--B B--C A--D C--D A--E B--E C--E D--E B--F E--J F--J
[[2]]
IGRAPH 0f2cef6 UN-- 11 20 --
+ attr: name (v/c)
+ edges from 0f2cef6 (vertex names):
[1] A--B A--D A--E B--C B--E B--F C--D C--E C--I C--L C--P D--E E--J
[14] F--G F--I F--J G--J I--J I--L I--P
[[3]]
IGRAPH 65a3787 UN-- 17 33 --
+ attr: name (v/c)
+ edges from 65a3787 (vertex names):
[1] A--B A--D A--E B--C B--E B--F C--D C--E C--I C--L C--P D--E E--J
[14] F--I F--J H--I H--J I--J I--L I--P K--L K--O L--M L--O L--T M--O
[27] M--S O--T P--Q P--S P--T Q--T S--T
2. igraph+egor
样本数据表格式:
require(egor)
#加载数据
data("alters32") ## alters in ego net
data("egos32") ## ego list in ego net
data("edges32") ## alter-later ties (edges) in ego net
#生成egor object
e1 <- egor(alters.df = alters32,
egos.df = egos32,
aaties = edges32,
ID.vars = list(
ego = "egoID",
alter = "alterID",
source = "Source",
target = "Target"))
e1
#可以用data frame的操作方式对e1进行查看
e1[e1$netsize > 15, ]
subset(e1, .alts$alter.sex=="w", unit="alter")
subset(e1, .aaties$weight > 1, unit="aatie")
summary(e1)
#计算density
ego_density(e1)
##可视化
data("egor32")
# Simplify networks to clustered graphs, stored as igraph objects
graphs <- clustered_graphs(egor32, "age")
# Visualize
par(mar=c(0,0,0,0))
vis_clustered_graphs(graphs[1:3],
node.size.multiplier = 10,
edge.width.multiplier = 5,
label.size = 0.6)
graphs2 <- clustered_graphs(make_egor(400, 200)[1:4], "country")
vis_clustered_graphs(graphs2[1:4],
node.size.multiplier = 2,
edge.width.multiplier = 15,
label.size = 0.6,
labels = TRUE)
#igraph & network plotting
#as_igraph() converts an egor object to a list of igraph objects.
#as_network() converts an egor object to a list of network objects.
par(mar=c(0,0,0,0))
purrr::walk(as_igraph(egor32)[1:4], plot)
purrr::walk(as_network(egor32)[1:4], plot)
#Shiny App for Visualization: egor_vis_app() starts a Shiny app that allows offers a graphical interface for adjusting the visualization parameters of the networks stored in an egor object.
egor_vis_app(egor32)
五、应用案例: 识别传销网络
通过分析用户电信网络通话记录,处理成无向图,用户可以分为普通用户、某大型企业员工、服务帐号、传销组织人员,提取不同类别用户的ego network。
特点:
(1) 节点之间不跨级联系;
(2)同级不抢占下线;
(3)同级非同上下线节点不联系
网络属性特征提取:
(1)非父子的连边数;
(2)同层同一父节点的连边边数;
(3)同层不同父节点的连边的边数;
(4)入度(in-degree):以某类顶点为目标节点,终止于该顶点的边的数目为该顶点的入度;
(5)出度(out-degree):以某类顶点为目标节点,起始于该顶点的边的数目为该顶点的出度。
示例如下:目标用户群指传销网络用户
五、 ego network相关的数据集
下载数据来玩一玩(TBD)
1.【Facebook ego network】:http://www.pkbigdata.com/common/share/53.html
2.【Twitter ego network】:http://www.pkbigdata.com/common/share/54.html
3.【Google+ ego network】:http://www.pkbigdata.com/common/share/56.html
4.【Communication ego network】:http://www.pkbigdata.com/common/share/57.html
5.【Employee ego network】:http://www.pkbigdata.com/common/share/59.html
6.【HighSchoolContact ego network】:http://www.pkbigdata.com/common/share/60.html
7.【Communication-1 ego network】:http://www.pkbigdata.com/common/share/61.html
参考资料
[1] 自我中心网络:科普+数据共享 : http://blog.sciencenet.cn/blog-3075-1072943.html ;
[2] Introduction to social network methods: http://faculty.ucr.edu/~hanneman/nettext/C9_Ego_networks.html;
[3] 如何利用大数据发现非法传销网络?: https://zhuanlan.zhihu.com/p/28838581;
[4] Thomas Zimmermann et al. Predicting Defects using Network Analysis on Dependency Graphs;
[5] Using egor to analyse ego-centered network data: https://cran.r-project.org/web/packages/egor/vignettes/using_egor.html ;
[6] R graph manual pages: https://igraph.org/r/doc/ego.html