网络数据统计分析笔记|| 网络数据可视化

前情回顾：
Gephi网络图极简教程
 Network在单细胞转录组数据分析中的应用
 网络数据统计分析笔记|| 为什么研究网络
 网络数据统计分析笔记|| 操作网络数据

以上两章我们已经图的结构有了零星地了解，虽然比较零散吧。我们知道网络图作为一种数据结构，一个主要的功能就是可视化，即在图中反应点线关系。可视化是一门艺术。本文我们介绍网络图可视化的基本概念和可视化图的进一步修饰。

我们在怎样的空间里来构建网络呢? 构图的元素就是点和线，可视化的过程就是给点找到合适的位置，在特定的构图空间中如何布局绘图元素可以近似为一个参数最优化问题。

图的布局

图数据的可视化(graph visualization)，核心在布局（Layout）。以图布局为核心的图数据可视化的起点，可以追溯到 1984年，在这一年，Eades提出将图数据的布局模拟为弹簧和铁环的物理系统。

设置布局的目的是让节点和边可以根据某种策略进行排布，从而使得图形的可视化符合我们的要求。一般的话，布局的设置满足四个原则：

节点能够均匀的分布在窗口内部。
避免边的交叉和弯曲。
边的长度一致。
整体的布局能够满足图的特性展示。

在Gephi中，基本的布局界面如下所示：

在Gephi中，一共给出了12个布局选项，这些布局基本上可以分成两种，一种是力引导的布局，其能够模仿物理世界的引力和斥力。力引导布局建立在物理学的基础之后是之上。能够将图中的节点模拟成原子，通过模拟原子之间的力场来计算节点之间的关系。力布局的方法包括Force Altas、Force Atlas 2、Fruchterman Reingold、OpenOrd、Yifan Hu、Yifan Hu比例。

我们简要介绍几种布局结构，注意啊，不是算法。

力导向布局算法也称 FDP(Force-Directed Placement)算法是目前在图布局算法上应用最为广泛的算法，其在自然规则模型(弹簧或电荷力)的指导下，能以人类易理解的形式充分展现图的整体结构，通用性强，在图的布局算法中占据主导地位。根据力导向算法得到的布局结果，具有节点间相关的特性，即布局过程取决于节点间的连接而非节点具有的属性，这种方法的缺点是其对初始状态十分敏感，且布局过程可能会陷入局部最优解，同时整个过程具有不确定性，不能确保每次得到相同的结果。

该布局方法遵循两个简单的原则：1、连接的点相互靠近。2、无连接的节点相互排斥。
主要参数如下：

区：定义图形的幅度，值越大图越大，也越稀疏。
重力：定义重力值、值越大、重力越强，节点越被中心吸引。
速度：定义布局的速度，值越大布局速度越快、当时越不精确。

OpenOrd是一个面向大规模图布局的开源算法，来自论文OpenOrd:An Open-Source Toolbox for Large Graph Layout

Yifan Hu为胡一凡布局,适用于非常大的图形，特点是粗化图形，减少计算量，运行速度比较快。这是对于力引导布局的一种改进。其在多层的力引导中引入了超节点的概念。在这个布局算法中，将一个节点与它远处的一簇之间的斥力当做此节点之间的斥力来计算，从而大大的减少了计算量，使得总的计算复杂度下降。

在igraph中有多少种布局算法呢？

1.1. igraph_layout_random — Places the vertices uniform randomly on a plane.
1.2. igraph_layout_circle — Places the vertices uniformly on a circle, in the order of vertex ids.
1.3. igraph_layout_star — Generate a star-like layout
1.4. igraph_layout_grid — Places the vertices on a regular grid on the plane.
1.5. igraph_layout_graphopt — Optimizes vertex layout via the graphopt algorithm.
1.6. igraph_layout_bipartite — Simple layout for bipartite graphs
1.7. The DrL layout generator
1.8. igraph_layout_fruchterman_reingold — Places the vertices on a plane according to the Fruchterman-Reingold algorithm.
1.9. igraph_layout_kamada_kawai — Places the vertices on a plane according the Kamada-Kawai algorithm.
1.10. igraph_layout_gem — The GEM layout algorithm, as described in Arne Frick, Andreas Ludwig,
1.11. igraph_layout_davidson_harel — Davidson-Harel layout algorithm
1.12. igraph_layout_mds — Place the vertices on a plane using multidimensional scaling.
1.13. igraph_layout_lgl — Force based layout algorithm for large graphs.
1.14. igraph_layout_reingold_tilford — Reingold-Tilford layout for tree graphs
1.15. igraph_layout_reingold_tilford_circular — Circular Reingold-Tilford layout for trees
1.16. igraph_layout_sugiyama — Sugiyama layout algorithm for layered directed acyclic graphs.

library(pacman)
grep('^layout_',p_functions(igraph),value = T)
[1] "layout_"              "layout_as_bipartite"  "layout_as_star"       "layout_as_tree"       "layout_components"   
[6] "layout_in_circle"     "layout_nicely"        "layout_on_grid"       "layout_on_sphere"     "layout_randomly"     
[11] "layout_with_dh"       "layout_with_drl"      "layout_with_fr"       "layout_with_gem"      "layout_with_graphopt"
[16] "layout_with_kk"       "layout_with_lgl"      "layout_with_mds"      "layout_with_sugiyama"

图的布局算法不是今天的主角，我们主要结合一些案例说明布局的应用。我们会用一个555的网络和一个博客网络。

library(sand)
g.l <- make_lattice(c(5, 5, 5))
 g.l
IGRAPH f14bc95 U--- 125 300 -- Lattice graph
+ attr: name (g/c), dimvector (g/n), nei (g/n), mutual (g/l), circular (g/l)
+ edges from f14bc95:
  [1]  1-- 2  1-- 6  1--26  2-- 3  2-- 7  2--27  3-- 4  3-- 8  3--28  4-- 5  4-- 9  4--29  5--10  5--30  6-- 7  6--11
 [17]  6--31  7-- 8  7--12  7--32  8-- 9  8--13  8--33  9--10  9--14  9--34 10--15 10--35 11--12 11--16 11--36 12--13
 [33] 12--17 12--37 13--14 13--18 13--38 14--15 14--19 14--39 15--20 15--40 16--17 16--21 16--41 17--18 17--22 17--42
 [49] 18--19 18--23 18--43 19--20 19--24 19--44 20--25 20--45 21--22 21--46 22--23 22--47 23--24 23--48 24--25 24--49
 [65] 25--50 26--27 26--31 26--51 27--28 27--32 27--52 28--29 28--33 28--53 29--30 29--34 29--54 30--35 30--55 31--32
 [81] 31--36 31--56 32--33 32--37 32--57 33--34 33--38 33--58 34--35 34--39 34--59 35--40 35--60 36--37 36--41 36--61
 [97] 37--38 37--42 37--62 38--39 38--43 38--63 39--40 39--44 39--64 40--45 40--65 41--42 41--46 41--66 42--43 42--47
[113] 42--67 43--44 43--48 43--68 44--45 44--49 44--69 45--50 45--70 46--47 46--71 47--48 47--72 48--49 48--73 49--50
+ ... omitted several edges

data(aidsblog)
summary(aidsblog)
IGRAPH 3e85e7e D--- 146 187 -- 
aidsblog
IGRAPH 3e85e7e D--- 146 187 -- 
+ edges from 3e85e7e:
  [1]   7->  1  37->  2  37->  3   7->  4 118->  5   7->  6 143->  7  73->  8 118->  9   7-> 10 118-> 11 143-> 12   7-> 13
 [14]   7-> 14  73-> 15 118-> 16 142-> 17  73-> 18   7-> 19 118-> 20  73-> 21 118-> 22 143-> 23 143-> 24   7-> 25   7-> 26
 [27]  12-> 27   7-> 28   7-> 29  73-> 30 143-> 31  73-> 32 143-> 33  12-> 34 143-> 35 118-> 36  37-> 37  12-> 38   7-> 39
 [40]   7-> 40  12-> 41   7-> 42 143-> 43 118-> 44  73-> 45  37-> 46   7-> 47 143-> 48   7-> 49 143-> 50   7-> 51 118-> 52
 [53]  73-> 53  37-> 54  12-> 55 143-> 56  12-> 57   7-> 58 143-> 59 143-> 60  37-> 61  73-> 62  37-> 63   7-> 64  37-> 65
 [66] 143-> 66 118-> 67  12-> 68  12-> 69  37-> 70  12-> 71 142-> 72 143-> 73  12-> 74   7-> 75  12-> 76  12-> 77   7-> 78
 [79] 142-> 79   7-> 80   7-> 81  12-> 82  12-> 83 143-> 84  73-> 85 143-> 86 118-> 87 118-> 88  37-> 89 142-> 90 142-> 91
 [92]   7-> 92 143-> 93  73-> 94  73-> 95 143-> 96   7-> 97   7-> 98  12-> 99 118->100  37->101  12->102 142->103   7->104
[105]  12->105 143->106   7->107  12->108  12->109  12->110  12->111  37->112   7->113   7->114  37->115 143->116  37->117
+ ... omitted several edges

可以看出博客网络是有向的，从一个博客指向另一个的原因是其中包含了他的友情链接。最简单的一种布局方法为环形布局，所有的节点几乎是等距地分布在一个圆形上。

V(aidsblog)$color <- colors()[1:(vcount(aidsblog))]
E(aidsblog)$color <- colors()[1:(ecount(aidsblog))]
# CHUNK 3
igraph_options(vertex.size=3, vertex.label=NA,
   edge.arrow.size=0.5)
par(mfrow=c(1, 2))
plot(g.l, layout=layout_in_circle)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout_in_circle)

应用作用力模型的算法布局:The Fruchterman-Reingold layout algorithm

plot(g.l,layout=layout_with_fr)
title("5x5x5 Lattice")
plot(aidsblog,layout=layout_with_fr)
title("Blog Network")

想了解更多，可以?layout_with_fr ，或者在控制台输入layout_with_fr 以查看源码：

layout_with_fr
function (graph, coords = NULL, dim = 2, niter = 500, start.temp = sqrt(vcount(graph)), 
    grid = c("auto", "grid", "nogrid"), weights = NULL, 
    minx = NULL, maxx = NULL, miny = NULL, maxy = NULL, minz = NULL, 
    maxz = NULL, coolexp, maxdelta, area, repulserad, maxiter) 
{
    if (!is_igraph(graph)) {
        stop("Not a graph object")
    }
    if (!is.null(coords)) {
        coords <- as.matrix(structure(as.double(coords), dim = dim(coords)))
    }
    dim <- as.integer(dim)
    if (dim != 2L && dim != 3L) {
        stop("Dimension must be two or three")
    }
    if (!missing(niter) && !missing(maxiter)) {
        stop("Both `niter' and `maxiter' are given, give only one of them")
    }
    if (!missing(maxiter)) 
        niter <- maxiter
    niter <- as.integer(niter)
    start.temp <- as.numeric(start.temp)
    grid <- igraph.match.arg(grid)
    grid <- switch(grid, grid = 0L, nogrid = 1L, auto = 2L)
    if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
        weights <- E(graph)$weight
    }
    if (!is.null(weights) && any(!is.na(weights))) {
        weights <- as.numeric(weights)
    }
    else {
        weights <- NULL
    }
    if (!is.null(minx)) 
        minx <- as.numeric(minx)
    if (!is.null(maxx)) 
        maxx <- as.numeric(maxx)
    if (!is.null(miny)) 
        miny <- as.numeric(miny)
    if (!is.null(maxy)) 
        maxy <- as.numeric(maxy)
    if (!is.null(minz)) 
        minz <- as.numeric(minz)
    if (!is.null(maxz)) 
        maxz <- as.numeric(maxz)
    if (!missing(coolexp)) {
        warning("Argument `coolexp' is deprecated and has no effect")
    }
    if (!missing(maxdelta)) {
        warning("Argument `maxdelta' is deprecated and has no effect")
    }
    if (!missing(area)) {
        warning("Argument `area' is deprecated and has no effect")
    }
    if (!missing(repulserad)) {
        warning("Argument `repulserad' is deprecated and has no effect")
    }
    on.exit(.Call(C_R_igraph_finalizer))
    if (dim == 2) {
        res <- .Call(C_R_igraph_layout_fruchterman_reingold, 
            graph, coords, niter, start.temp, weights, minx, 
            maxx, miny, maxy, grid)
    }
    else {
        res <- .Call(C_R_igraph_layout_fruchterman_reingold_3d, 
            graph, coords, niter, start.temp, weights, minx, 
            maxx, miny, maxy, minz, maxz)
    }
    res
}
<bytecode: 0x000001e2d8cbfec8>
<environment: namespace:igraph>

能量布局算法(energy placement methods). Kamada-Kawai layout algorithm.

plot(g.l, layout=layout_with_kk)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout_with_kk)
title("Blog Network")

在有些情况下，希望强调网络图中的特殊结构。树就是其中之一。如：

g.tree <- graph_from_literal(1-+2,1-+3,1-+4,2-+5,2-+6,
                         2-+7,3-+8,3-+9,4-+10)
par(mfrow=c(1, 3))
igraph_options(vertex.size=30, edge.arrow.size=0.5,
   vertex.label=NULL)
plot(g.tree, layout=layout_in_circle)
plot(g.tree, layout=layout_as_tree(g.tree, circular=T))
plot(g.tree, layout=layout_as_tree)

节点多了就不好看出来了

par(mfrow=c(1, 2))

plot(g.l, layout=layout_as_tree)
title("5x5x5 Lattice")
plot(aidsblog, layout=layout_as_tree)
title("Blog Network")

二部图通常采用类似火车轨道的布局。

plot(g.bip, layout= -layout_as_bipartite(g.bip)[,2:1], 
   vertex.size=60, vertex.shape=ifelse(V(g.bip)$type, 
      "rectangle", "circle"),
   vertex.label.cex=1.75,
   vertex.color=ifelse(V(g.bip)$type, "red", "cyan"))

修饰图的布局

网络图的可视化中，节点的相对位置和边的布置相当重要。但如果存在其他节点和边的信息，可以通过调整节点的大小颜色和边的粗细来整合，这就是图的修饰。

我们用一个俱乐部网络来演示图的修饰，节点代表俱乐部成员，边代表两者的关系。俱乐部中有主管和教练，组成了两个小团体。如何在图中刻画这种关系呢？

library(igraphdata)
data(karate)
# Reproducible layout
set.seed(42)
l <- layout_with_kk(karate)
# Plot undecorated first.
igraph_options(vertex.size=10)

karate
IGRAPH 4b458a1 UNW- 34 78 -- Zachary's karate club network
+ attr: name (g/c), Citation (g/c), Author (g/c), Faction (v/n), name (v/c), label (v/c), color
| (v/n), weight (e/n)
+ edges from 4b458a1 (vertex names):
 [1] Mr Hi  --Actor 2  Mr Hi  --Actor 3  Mr Hi  --Actor 4  Mr Hi  --Actor 5  Mr Hi  --Actor 6 
 [6] Mr Hi  --Actor 7  Mr Hi  --Actor 8  Mr Hi  --Actor 9  Mr Hi  --Actor 11 Mr Hi  --Actor 12
[11] Mr Hi  --Actor 13 Mr Hi  --Actor 14 Mr Hi  --Actor 18 Mr Hi  --Actor 20 Mr Hi  --Actor 22
[16] Mr Hi  --Actor 32 Actor 2--Actor 3  Actor 2--Actor 4  Actor 2--Actor 8  Actor 2--Actor 14
[21] Actor 2--Actor 18 Actor 2--Actor 20 Actor 2--Actor 22 Actor 2--Actor 31 Actor 3--Actor 4 
[26] Actor 3--Actor 8  Actor 3--Actor 9  Actor 3--Actor 10 Actor 3--Actor 14 Actor 3--Actor 28
[31] Actor 3--Actor 29 Actor 3--Actor 33 Actor 4--Actor 8  Actor 4--Actor 13 Actor 4--Actor 14
+ ... omitted several edges

par(mfrow=c(1,1))
plot(karate, layout=l, vertex.label=V(karate), 
   vertex.color=NA)

未经修饰的图

# Now decorate, starting with labels.
V(karate)$label <- sub("Actor ", "", V(karate)$name)
# Two leaders get shapes different from club members.
V(karate)$shape <- "circle"
V(karate)[c("Mr Hi", "John A")]$shape <- "rectangle"
# Differentiate two factions by color.
V(karate)[Faction == 1]$color <- "red"
V(karate)[Faction == 2]$color <- "dodgerblue"
# Vertex area proportional to vertex strength
# (i.e., total weight of incident edges).
V(karate)$size <- 4*sqrt(strength(karate))
V(karate)$size2 <- V(karate)$size * .5
# Weight edges by number of common activities
E(karate)$width <- E(karate)$weight
# Color edges by within/between faction.
F1 <- V(karate)[Faction==1]
F2 <- V(karate)[Faction==2]
E(karate)[ F1 %--% F1 ]$color <- "pink"
E(karate)[ F2 %--% F2 ]$color <- "lightblue"
E(karate)[ F1 %--% F2 ]$color <- "yellow"
# Offset vertex labels for smaller points (default=0).
V(karate)$label.dist <- 
   ifelse(V(karate)$size >= 9.0, 0, 1.0)
# Plot decorated graph, using same layout.
plot(karate, layout=l)

修饰另一个律师网

# CHUNK 9
par(mfrow=c(1,2))
library(sand)
data(lazega)
lazega
IGRAPH 3e8b2bf UN-- 36 115 -- 
+ attr: name (v/c), Seniority (v/n), Status (v/n), Gender (v/n), Office (v/n), Years (v/n), Age
| (v/n), Practice (v/n), School (v/n)
+ edges from 3e8b2bf (vertex names):
 [1] V1 --V17 V2 --V7  V2 --V16 V2 --V17 V2 --V22 V2 --V26 V2 --V29 V3 --V18 V3 --V25 V3 --V28 V4 --V12
[12] V4 --V17 V4 --V19 V4 --V20 V4 --V22 V4 --V26 V4 --V28 V4 --V29 V4 --V31 V5 --V18 V5 --V24 V5 --V28
[23] V5 --V31 V5 --V32 V5 --V33 V6 --V24 V6 --V28 V6 --V30 V6 --V31 V6 --V32 V7 --V18 V9 --V12 V9 --V16
[34] V9 --V29 V10--V24 V10--V26 V10--V29 V10--V31 V10--V34 V11--V17 V12--V15 V12--V16 V12--V17 V12--V19
[45] V12--V26 V12--V29 V12--V34 V13--V31 V13--V33 V14--V16 V14--V17 V14--V25 V14--V28 V14--V30 V14--V32
[56] V15--V16 V15--V19 V15--V20 V15--V22 V15--V24 V15--V26 V15--V29 V15--V32 V15--V35 V15--V36 V16--V17
[67] V16--V22 V16--V26 V16--V27 V16--V29 V16--V32 V16--V34 V16--V36 V17--V19 V17--V22 V17--V24 V17--V25
+ ... omitted several edges

plot(lazega)
# Office location indicated by color.
colbar <- c("red", "dodgerblue", "goldenrod")
v.colors <- colbar[V(lazega)$Office]
# Type of practice indicated by vertex shape.
v.shapes <- c("circle", "square")[V(lazega)$Practice]
# Vertex size proportional to years with firm.
v.size <- 3.5*sqrt(V(lazega)$Years)
# Label vertices according to seniority.
v.label <- V(lazega)$Seniority
# Reproducible layout.
set.seed(42)
l <- layout_with_fr(lazega)
plot(lazega, layout=l, vertex.color=v.colors,
   vertex.shape=v.shapes, vertex.size=v.size,
   vertex.label=v.label)

可视化大型网络

到目前为止，我们绘制的图节点数量均小于100个，当节点数大于100时图会显得十分凌乱，这时候不同布局算法所反应的信息往往不同。也有专门可视化大型网络的布局算法，如VxOrd.

# CHUNK 12
set.seed(42)
l = layout_with_kk(fblog)
party.nums.f <- as.factor(V(fblog)$PolParty)
party.nums <- as.numeric(party.nums.f)
# igraph color palette has 8 colors
# 9 colors needed (for 9 political parties)
library(RColorBrewer)
colrs <- brewer.pal(9,"Set1")
V(fblog)$color <- colrs[party.nums]
plot(fblog, layout=l, vertex.label=NA,
      vertex.size=3)

# CHUNK 13
set.seed(42)
l <- layout_with_drl(fblog)
plot(fblog, layout=l, vertex.size=5, vertex.label=NA)

存在聚集性并且可以明显的特征化，可能是使用了某个观测变量，也可能是通过图分割算法推断的结果。在可视化之前先将成组的点用meta-vertices代替，实现粗粒化。

# CHUNK 14
fblog.c <- contract(fblog, party.nums)
E(fblog.c)$weight <- 1
fblog.c <- simplify(fblog.c)

# CHUNK 15
party.size <- as.vector(table(V(fblog)$PolParty))
plot(fblog.c, vertex.size=5*sqrt(party.size),
      vertex.label=party.names, vertex.color=colrs,
      edge.width=sqrt(E(fblog.c)$weight),
      vertex.label.dist=3.5, edge.arrow.size=0)

有时候我们想展示某些特定的信息，另一种方式是只展示相关的子图。描述局部结构可以用个体中心网络。

# CHUNK 16
data(karate)
k.nbhds <- make_ego_graph(karate, order=1)

# CHUNK 17
sapply(k.nbhds, vcount)
# ---
##  [1] 17 10 11  7  4  5  5  5  6  3  4  2  3  6  3  3  3
## [18]  3  3  4  3  3  3  6  4  4  3  5  4  5  5  7 13 18
# ---

# CHUNK 18
k.1 <- k.nbhds[[1]]
k.34 <- k.nbhds[[34]]
par(mfrow=c(1,2))
plot(k.1, vertex.label=NA,
   vertex.color=c("red", rep("lightblue", 16)))
plot(k.34, vertex.label=NA,
   vertex.color=c(rep("lightblue", 17), "red"))

其他可视化工具

尽管使用R有许多的优点，但是毕竟还不能实时调控。还有许多界面软件可以使用。

Graphviz
Pajek
Cytoscape
Gephi
qgraph

等等。

https://blog.csdn.net/u012935646/article/details/107808259
https://photo.wondershare.com/graph-maker/best-tool-for-graph-visualization-online.html
EADES P. A heuristic for graph drawing[J]. Congressus numerantium, 1984, 42: 149-160.
https://blog.csdn.net/xyisv/article/details/104135708
https://cloud.tencent.com/developer/article/1483027
北京大学图可视分析系列
 https://blog.csdn.net/hei653779919/article/details/106861213
https://www.yuque.com/antv/g6/fvuhbz
LEACH分簇算法实现和能量控制算法实现
 http://www.bu.edu/cs/files/2014/05/Kolaczyk.pdf
https://kateto.net/network-visualization