基本概念
社区(community)
网络图内部连接比较紧密的节点子集合对应的子图叫做社区。各社区节点集合彼此没有交集的称为非重叠型社区,有交集的称为重叠型社区。
社区发现(Community Detection)
对给定的网络图寻找其社区结构的过程称为社区发现。是一种聚类的过程。
lpa算法概述
LPA标签传播算法是社区发现算法中最简单易懂的一种,核心思想是近朱者赤。通过图中一个点所在的圈子来判断该点的特征性质,连接紧密的点被划分到同一个社区中,达到物以类聚的效果。
大体流程:在一个由点和边组成的图结构中,通过一个点V的所有邻居对V进行投票,每个邻居把自己社区标签投给V,票数占多数的标签作为V的社区标签(如果最大票数中多个标签持平,则随机选择一个)。使用上述方法对所有的点迭代一轮,则每个点都会被分配到一个社区中(打上了标签)。迭代多轮后、或者在某一轮迭代后社区保持稳定时,连接紧密的点将有共同的社区标签,算法结束。
基本思想
将一个节点的邻居节点的标签中数量最多的标签作为该节点自身的标签,给每个节点添加标签(label)以代表它所属的社区,并通过标签的传播形成同一标签的社区结构。
第一步: 为所有节点指定一个唯一的标签;
第二步: 逐轮刷新所有节点的标签,直到达到收敛要求为止。对于每一轮刷新,节点标签刷新的规则如下:
对于某一个节点,考察其所有邻居节点的标签,并进行统计,将出现个数最多的标签作为自身的标签。当个数最多的标签不唯一时,随机选一个。
优点
收敛周期短,无需任何先验参数(不需要指定社区个数和大小),算法执行过程中不需要计算任何社区指标。
评价标准
社区发现的主要评价指标有Jaccard指数,fsame指数、NMI(规范化交互信息)以及Modularity(模块度)等
Jaccard指数
衡量社区分割正确性的指标,在已知正确划分的情况下通过正确分类的节点对的数量来计量
NMI
依然是已知划分情况下与真实结果差异度的比较指标,其标准差可以衡量算法的稳定性
Modularity(模块度)
网络中连接社区内部边所占的比例与另一网络中的内部边的期望值之间的差值
伪代码:
输入:无向图邻接矩阵AdjacentMatrix,节点个数VerticeNum
输出:存储节点标签的分类数组Community
//初始化每个节点的标签
For i <- 0 to VerticeNum Do
Community[i] <- i
//寻找i节点的所有邻居存入Neighbor[i]
FindMaetexNonZero(i,AdjacentMatrix,NeighBor[i])
while 未达到分类标准 or 未超出迭代阈值 then
RandomSort(SS)//生成随机序数队列SS
For i <- 0 to VerticeNum Do
//统计节点i邻居中数量最多的标签
VectorFrequency(Neighbor[i], lable)
//若只有一个数量最多则直接赋值
if lable.size() = 1 then
Community[i] <- lable[0]
//若有多个相同数量的标签则随机选择一个
else then
Community[i] <- lable[random]
return Community
java实现
1.下载安然公司数据集
下载链接:http://snap.stanford.edu/data/email-Enron.html
数据格式如下
0 1
1 0
1 2
1 3
1 4
1 5
1 6
1 7
2.findDominantLabel.java
import java.util.Collections;
import java.util.Vector;
import java.util.concurrent.Callable;
import java.util.Random;
public class findDominantLabel implements Callable<Boolean> {
private Vector<Integer> dominant_label;
private Vector<Integer> label_count;
private int node_id;
private Vector<NODE> node_list;
private Random randGen;
public findDominantLabel(Vector<NODE> node_list) {
dominant_label = new Vector<Integer>();
label_count = new Vector<Integer>(node_list.size()); //vector of size nodelist
for (int i = 0; i < node_list.size(); i++) { // to prevent Index out of bounds exception
label_count.add(Integer.valueOf(0));
}
randGen = new Random(); // to prevent Null point exception
this.node_list = node_list;
}
public void link_node_to_process(int node_id) {
this.node_id = node_id;
}
@Override
public Boolean call() {
if (node_id == -1) // if node Id not present (no node left)
return Boolean.FALSE;
boolean run = false;
Collections.fill(label_count, Integer.valueOf(0)); // Initializing label_count with same label as the node
dominant_label.clear();
NODE curr_node = node_list.get(node_id);
int maximum_count = 0;
for (Integer neighborId : curr_node.get_neighbors()) { // for all neighbors
int neighbor_label = node_list.get(neighborId).get_label_name(); // get label of the neighbor from node list
if (neighbor_label == 0)
continue; // no label assignment yet, just the initialization done so far
int neighbor_label_count = label_count.get(neighbor_label) + 1; //total neighbors wih same label (neighbor_label)+ itself (1)
label_count.set(neighbor_label, neighbor_label_count); //replace the neighbor_label(index_num) with its total count in the neighborhood
if (maximum_count < neighbor_label_count) { //some other neighbor label has max count
maximum_count = neighbor_label_count; //replace max count by that neighbor label count
dominant_label.clear(); //clear the previous dominant label
dominant_label.add(neighbor_label); //make neighbor_label as the new dominant label (since it has max count)
} else if (maximum_count == neighbor_label_count) {
dominant_label.add(neighbor_label);
}
}
if (dominant_label.size() > 0) { // more than 1 dominant label among the neighbors
//int rand = randGen.nextInt(dominant_label.size());
//rand = dominant_label.get(rand);
Integer maximum_label = Collections.max(dominant_label); // choose the maximum label
if (label_count.get(curr_node.get_label_name()) != maximum_count) { //is the current label dominant label
run = true; //current label not dominant
}
curr_node.set_label_name(maximum_label);
}
return Boolean.valueOf(run);
}
}
3.LP.java
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.HashSet;
import java.util.Set;
import java.util.Vector;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
class NODE {
int label_name;
int id;
Set<Integer> neighbors;
public NODE(int id, int label_name) {
this.label_name = label_name;
this.id = id;
this.neighbors = new HashSet<Integer>();
}
/*-************************* Node methods defined ***********************-*/
public int get_id() {
return id;
}
public void set_label_name(int label_name) {
this.label_name = label_name;
}
public int get_label_name() {
return label_name;
}
public void set_neighbors(Set<Integer> neighbors) {
this.neighbors = neighbors;
}
public void append_neighbors(int id) {
this.neighbors.add(Integer.valueOf(id));
}
public Set<Integer> get_neighbors() {
return neighbors;
}
}
/*-****************************** main class *****************************/
/* Read input file, write output file, Community Detection Algorithm */
/*-***********************************************************************/
public class LP {
Vector<NODE> node_list;
Vector<Integer> ordered_nodes;
public LP() { //default constructor
}
/*-************************** Read input file ***************************-*/
/*Format : "node_id node_id" and input file has edge list of the graph. */
/*-**********************************************************************-*/
public void readinput(int total_nodes, String input_file) throws IOException {
FileReader input = new FileReader(input_file);
BufferedReader br = new BufferedReader(input);
node_list = new Vector<NODE>(total_nodes);
ordered_nodes = new Vector<Integer>(total_nodes);
for (int i = 0; i < total_nodes; i++) {
node_list.add(new NODE(i, i)); //adding all the nodes to the node list
ordered_nodes.add(i); //Preserving order of nodes
}
System.out.println(total_nodes + " nodes added.");
String input_lines = br.readLine();
while (input_lines != null) {
String[] split = input_lines.split(" "); //spliting each lines by spaces
int v1 = Integer.valueOf(split[0]); //first half be node v1
int v2 = Integer.valueOf(split[1]); //second half be node v2
node_list.get(v1).append_neighbors(v2); //add v2 in the neighborlist of v1
node_list.get(v2).append_neighbors(v1); //add v1 in the neighborlist of v2
input_lines = br.readLine(); //next line
}
/*-**************************Neighbor list complete ***********************/
br.close();
}
/*-************************** Write output file ***************************-*/
/* Format : "node_id community label" */
/*-************************************************************************-*/
public void final_communities(String file) throws IOException {
Map<Integer, Integer> assign_label = new HashMap<Integer, Integer>();
int label_count = 0;
for (int i = 0; i < node_list.size(); i++) {
int label = node_list.get(i).get_label_name();
Integer r = assign_label.get(Integer.valueOf(label));
if (r == null) {
label_count++;
assign_label.put(Integer.valueOf(label), Integer.valueOf(label_count));
}
}
System.out.println("communities = " + label_count);
/* label_count communities found */
FileOutputStream fso = new FileOutputStream(file);
OutputStreamWriter fileWriter = new OutputStreamWriter(fso, Charset.forName("UTF-8"));
NODE node;
for (int i = 0; i < node_list.size(); i++) {
node = node_list.get(i);
fileWriter.write(node.get_id() + "--" + assign_label.get(Integer.valueOf(node.get_label_name())).intValue() + "\n");
}
System.out.println("DONE");
fileWriter.close();
fso.close();
}
/*-************************** Community Detection ***********************-*/
/*Multi-Threading can also be adapted to multi-processor architecture */
/*-**********************************************************************-*/
public void communityDetection(int total_threads) throws IOException, ExecutionException, InterruptedException {
ExecutorService threadPool = Executors.newFixedThreadPool(total_threads);
Vector<findDominantLabel> dominantLabel_Calc = new Vector<findDominantLabel>(total_threads);
for (int i = 0; i < total_threads; i++) {
dominantLabel_Calc.add(new findDominantLabel(node_list));
}
//int iteration = 0;
int label_change = 100; //number of nodes change labels (unstable configuration)
while (label_change > 0) {
label_change = 0;
Collections.shuffle(ordered_nodes);
//PARALLELISM
for (int i = 0; i < node_list.size(); i += total_threads) { //for all nodes
for (int j = 0; j < total_threads; j++) { //blocks of total threads number of nodes run paralley together
if ((i + j) < node_list.size()) // if there are enough nodes(= number of threads) to run parallely
/*pull each of the j threads from vector (dominantLabel_Calc) and link
each node from the ordered list to a thread (one to one mapping) */
dominantLabel_Calc.get(j).link_node_to_process(ordered_nodes.get(i + j).intValue());
else
dominantLabel_Calc.get(j).link_node_to_process(-1);
}
List<Future<Boolean>> result = threadPool.invokeAll(dominantLabel_Calc);
for (int k = 0; k < result.size(); k++) {
Boolean b = result.get(k).get();
if (b != null && b.booleanValue() == true) {
label_change++;
if (label_change == 1)
// System.out.print("once more");
break;
}
}
}
}
System.out.println("Communities found");
threadPool.shutdown();
}
/*-************************** Main***************************************-*/
/*Change number of threads here, (also other parameters) */
/*-**********************************************************************-*/
public static void main(String[] args) throws IOException, InterruptedException, ExecutionException {
LP algo = new LP();
int total_nodes = 37000; // Total nodes in the input graph
int total_threads = 12; // Total threads to use
algo.readinput(total_nodes, "E:\\yishuheWorkspace\\graph-ysh\\src\\main\\java\\com\\graph\\service\\lpa\\Enron_email_dataset");
long startTime = System.currentTimeMillis();
algo.communityDetection(total_threads);
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println("Time_Taken" + totalTime / 1000.0);
algo.final_communities("Pooya.txt");
}
}