Network Embedding

本文结构安排

  • M-NMF
  • LANE
  • LINE

什么是Network Embedding?

dataislinked.png
low-demensionsl.png

LINE

一阶二阶相似度衡量.png
  • [Information Network]
    An information network is defined as G = (V,E), where V is the set
    of vertices, each representing a data object and E is the
    set of edges between the vertices, each representing a relationship between two data objects. Each edge e\in E is an ordered pair e = (u,v) and is associated with a weight w_{uv} > 0, which indicates the strength of the relation. If G is undirected, we have (u,v) ≡ (v,u) and w_{uv} \equiv w_{vu}; if G is directed, we have (u,v) \neq (v,u) and w uv \neq w vu

  • [First-order Proximity] The first-order proximity in a network is the local pairwise proximity between two vertices. For each pair of vertices linked by an edge (u,v), the weight on that edge,w_{uv}, indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0. The first-order proximity usually implies the similarity of two nodes in a real-world network.

    LINE with First-order Proximity:The first-order proximity refers to the local pairwise proximity between the vertices in the network. For each undirected edge (i,j), the joint probability between vertex v_{i} and v_{j} as follows:
    p_{1}(v_{i},v_{j})=\frac{1}{1+\exp(-\vec{u}_{i}^{T} \cdot \vec{u}_{j})}
    where u_{i} \in R^{d} is the low-dimensional vector representation of vertex v_{i} . \hat{p}_{1}(i,j) = \frac{w_{ij}}{W},where W = \sum_{(i,j) \in E}^{ }w_{ij} .
    And its empirical probability can be defined as\hat{p}_{1}(i,j)=\frac{w_{ij}}{W},where W=\sum_{(i,j)\in E}^{ }w_{ij}.

    To preserve the first-order proximity we can minimize the following objective function:
    O_{1}=d(\hat{p}_{1}(\cdot,\cdot),p_{1}(\cdot,\cdot))
    where d(\cdot,\cdot) is the distance between two distributions. We choose to minimize the KL-divergence of two probability distributions. Replacing d(\cdot,\cdot) with KL-divergence and omitting some constants, we have:
    O_{1}=-\sum_{(i,j)\in E}^{ }w_{ij}\log p_{1}(v_{i},v_{j})

  • [Second-order Proximity] The second-order proximity between a pair of vertices (u,v) in a network is the similarity between their neighborhood network structures. Mathematically, let p_{u} = (w_{u,1} ,...,w_{u,|V|}) denote the first-order proximity of u with all the other vertices,then the second-order proximity between u and v is determined by the similarity between p u and p v . If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.

    The second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. In this case, each vertex is also treated as a specific “context” and vertices with similar distributions over the “contexts” are assumed to be similar.
    Therefore, each vertex plays two roles: the vertex itself and a specific “context” of other vertices.We introduce two vectors \vec{u}_{i} and \vec{u}_{i}^{'} , where \vec{u}_{i} is the representation of v_{i} when it is treated as a vertex while \vec{u}_{i}^{'} is the representation of v_{i} when it is treated as a specific “context”. For each directed edge (i,j),we first define the probability of “context” v_{j} generated by vertex v_{i} as:

    p_{2}(v_{j},v_{i})=\frac{\exp(\vec{u}_{i}^{'T} \cdot \vec{u}_{i}) }{\sum_{k=1}^{|V|}\exp(\vec{u}_{k}^{'T} \cdot \vec{u}_{i})}
    where |V| is the number of vertices or “contexts”.\hat{p}_{2}(v_{i},v_{j}) = \frac{w_{ij}}{d},where d = \sum_{k \in Neibour(i)}^{ }w_{ik}.

    The second-order proximity assumes that vertices with similar distributions over the contexts are similar to each other. To preserve the second-order proximity, we should make the conditional distribution of the contexts p_{2}(\cdot|v_{i}) specified by the low-dimensional representation be close to the empirical distribution \hat{p}_{2}(\cdot |v_{i}).Therefore, we minimize the following objective function:

    O_{2}=\sum_{i \in V}^{ }\lambda_{i}d(\hat{p}_{2}(\cdot | v_{i}),p_{2}(\cdot | v_{i}))

    where d(\cdot,\cdot) is the distance between two distributions.
    \lambda_{i} in the objective function is to represent the prestige of vertex i in the network,which can be measured by the degree or estimated through algorithms.

    The empirical distribution \hat{p}_{2}(\cdot |v_{i}) is defined as
    \hat{p}_{2}(v_{j} |v_{i})=\frac{w_{ij}}{d_{i}},where w_{ij} is the weight of the edge (i,j) and d_{i} is the out-degree of vertex i. Here we adopt KL-divergence as the distance function:

    O_{2}=-\sum_{(i,j)\in E}^{ }w_{ij}\log p_{2}(v_{j}|v_{i})

    minimize this objective O_{2}, we are able to represent every vertex v{i} with a d-dimensional vector \vec{u}_{i}

  • [Large-scale Information Network Embedding] Given a large network G = (V,E), the problem of Large-scale Information Network Embedding aims to represent each vertex v \in V into a low-dimensional space R^{d},learning a function f_{G}:V \rightarrow R^{d} , where |V| \gg d. In the space R^{d} , both the first-order proximity and the second-order proximity between the vertices are preserved.

    We adopt the asynchronous stochastic gradient algorithm (ASGD) for optimizing O_{2},In each step, the ASGD algorithm samples a mini-batch of edges and then updates the model parameters. If an edge (i,j) is sampled, the gradient the embedding vector \vec{u}_{i} of vertex i will be calculated as:

    \frac{\partial O_{2}}{\partial \vec{u}_{i}}=w_{ij}\frac{\partial \log p_{2}(v_{j}|v_{i})}{\partial \vec{u}_{i}}

    Optimizing objectives are computationally expensive,which requires the summation over the entire set of vertices when calculating the conditional probability p_{2}(\cdot |v_{i}). To address this problem, we adopt the approach of\textbf{ negative sampling }proposed.

    arg \min_{U,U'} O_{2} = \sum_{(i,j)\in E}^{ }w_{ij}[\log \sigma(\vec{u}_{j}^{'T}\cdot \vec{u}_{i}) + \sum_{i=1}^{K}E_{v_{n}}\sim P_{n}(v)[\log \sigma(-\vec{u}_{k}^{'T}\cdot \vec{u}_{i})]]

\frac{\partial O_{2}}{\partial \vec{u}_{i}} = -w_{ij}[\vec{u}_{j}^{'}(1-\sigma(\vec{u}_{j}^{'T}\cdot \vec{u}_{i})) -\sum_{k=1}^{K}\vec{u}_{k}^{'}(\vec{u}_{k}^{'T}\cdot \vec{u}_{i})]

\frac{\partial O_{2}}{\partial \vec{u}_{j}^{'}} = -w_{ij}\vec{u}_{i}[1-\sigma(\vec{u}_{j}^{'T}\cdot \vec{u}_{i})]

\frac{\partial O_{2}}{\partial \vec{u}_{k}^{'}} = w_{ij}\vec{u}_{i}\sigma(\vec{u}_{k}^{'T}\cdot \vec{u}_{i})

Update parameter \vec{u}_{i},\vec{u}_{j}^{'},\vec{u}_{k}^{'}:

\vec{u}_{i} = \vec{u}_{i} - \rho \frac{\partial O_{2}}{\partial \vec{u}_{i}}

\vec{u}_{j}^{'} = \vec{u}_{j}^{'} \rho \frac{\partial O_{2}}{\partial \vec{u}_{j}^{'}}

\vec{u}_{k}^{'} = \vec{u}_{k}^{'} - \rho \frac{\partial O_{2}}{\partial \vec{u}_{k}^{'}}

The above is the result of optimizing O_{2}, and the obtained U is the result of the second-order similarity. The optimization of O_{1} is similar to optimization of O_{2}, only one variable U needs to be updated. Just change \vec{u}_{j}^{'} to

M-NMF

The objective function is not convex, and we separate the
optimization to four subproblems and iteratively optimize them, which guarantees each subproblem converges to the local minima.

objective function:
\min_{M,U,H,C}=||S-MU||_{F}^{2}+\alpha||H-UC^{T}||_{F}^{2}-\beta tr(H^{T}BH)

s.t.,M\geq 0,U\geq0,H\geq,C\geq,tr(H^{T}H)=n

M-subproblem: With other parameters in objective function fixed leads to a standard NMF formulation,the updating rule for M is:

M \leftarrow M \odot \frac{SU}{MU^{T}U}

U-subproblem: Updating U with other parameters in objective function
fixed leads to a joint NMF problem,the updating rule is:

U \leftarrow U \odot \frac{S^{T}M+\alpha HC}{U(M^{T}M+\alpha C^{T}C)}

C-subproblem: Updating C with other parameters in objective function
fixed also leads to a standard NMF formulation,the updating rule of C is:

C \leftarrow C \odot \frac{H^{T}U}{CU^{T}U}

H-subproblem: This is the fixed point equation that the solution must satisfy at convergence. Given an initial value of H, the successive updating rule of H is:

H \leftarrow H \odot \sqrt{ \frac{-w\beta B_{1}H+\sqrt{ \bigtriangleup}}{8\lambda HH^{T}H}}

where \bigtriangleup = 2\beta(B_{1}H) \odot 2\beta(B_{1}H) + 16\lambda(HH^{T}H)\odot(2\beta AH+2\alpha UC^{T}+(4\lambda - 2\alpha)H)

LANE

see as another article of my blog
[论文阅读——LANE-Label Informed Attributed Network Embedding原理即实现]https://www.jianshu.com/p/1abb24bb8a04

LINE-O2 Spark实现

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import breeze.linalg._
import breeze.numerics._
import breeze.stats.distributions.Rand
import scala.math._

object LINE {

    //生成一个随机数序列List,Range是范围,num是随机序列个数
    def RandList(Range:Int,num:Int) : List[Int] = {
        var resultList:List[Int]=Nil
        while (resultList.length < num){
            val randomNum = (new util.Random).nextInt(Range)
            if(!resultList.exists(s => s==randomNum )){
                resultList=resultList:::List(randomNum)
            }
        }
        return resultList
    }

    def RandNumber(Range:Int) : Int = {
        val randomNum = (new util.Random).nextInt(Range)
        return randomNum
    }

    def Sigmoid(In:Double): Double = {
        var Out:Double = 1.0/(math.exp(-1.0*In)+1)
        return Out
    }

    def main(args: Array[String]) {
        if (args.length < 4) {
            System.err.println("Usage: LINE <Adjacent Matrix> <Adjacent Edge> <Negative Sample> <dimension>")
            System.exit(1)
        }

        //负采样个数
        val NS = args(2).toInt
        println("Negative Sample: "+NS)

        //图嵌入的维度
        val Dim = args(3).toInt
        println("Embedding dimension: "+Dim)

        //spark配置和上下文
        val conf = new SparkConf().setAppName("LINE")
        val sc = new SparkContext(conf)

        //输入邻接矩阵
        val InputFile = sc.textFile(args(0),3)
        //输入邻接表文件
        val EgdeFile = sc.textFile(args(1),3)

        //输出输入的文件行数
        val InputFileCount = InputFile.count().toInt
        println("InputFileCount(number of lines): "+InputFileCount)

        //随机采样率
        val sample_rate : Double = 0.1

        //负采样哈希表的映射长度
        val HashTableSize: Int = 50000
        println("HashTableSize: "+HashTableSize)


        //LINE O_2 的二阶相似度变量 
        var U_vertex = DenseMatrix.rand(InputFileCount, Dim, Rand.uniform)
        var U_context = DenseMatrix.rand(InputFileCount, Dim, Rand.uniform)

        //邻接矩阵RDD
        val Adjacent = InputFile.map(line => line.split(",")).map(splitline => splitline.map(word => word.toDouble))
        val EgdeSet = EgdeFile.map(line => line.split(",")).map(splitline => splitline.map(word => word.toDouble))
        

        //当数据量变大,collect操作将会有崩溃 待优化点1
        val AdjacentCollect = Adjacent.collect()

        //邻接矩阵的行和列
        val rows = AdjacentCollect.length
        val cols = AdjacentCollect(0).length

        //邻接矩阵拉长为一维向量
        val flattenAdjacent = AdjacentCollect.flatten

        //邻接矩阵转为 breeze 矩阵
        val AdjacentMatrix = new DenseMatrix(cols,rows,flattenAdjacent).t
        
        //println(Adjacent.take(10).toList)
        // Adjacent.foreach{
        //  rdd => println(rdd.toList)
        // }

        //每个点的度RDD
        val VertexDegree = Adjacent.map(line => line.reduce((x,y) => x+y))

        //所有点的度求和
        var SumOfDegree = VertexDegree.reduce((x,y)=>x+y)

        //var SumOfDegree = sc.accumulator(0)
        //VertexDegree.foreach(x => SumOfDegree += x)  

        //对点的概率进行平滑,3/4次幂
        val SmoothProbability = VertexDegree.map(degree => degree/SumOfDegree).map(math.pow(_,0.75))

        //求SmoothProbability的累积概率CumulativeProbability
        val p : Array[Double] = SmoothProbability.collect()
        val CumulativeProbability : Array[Double] = new Array[Double](InputFileCount)
        for(i <- 0 to InputFileCount-1) {
            var inner_sum : Double = 0.0
            for(j <- 0 to i){
                inner_sum = inner_sum + p(j)
            }
            CumulativeProbability(i) = inner_sum
        }

        //归一化后的累积概率后,乘以HashTableSize并取整,可以得到0~HashTableSize之内的整数
        val HashProbability : Array[Int] = new Array[Int](InputFileCount)
        //累积概率的最大值
        var max_cpro = CumulativeProbability(InputFileCount-1)
        for(i <- 0  to InputFileCount-1)
        {
            HashProbability(i) = ((CumulativeProbability(i)/max_cpro)*HashTableSize).toInt
        }
        //点的id的哈希表
        val HashTable : Array[Int] = new Array[Int](HashTableSize+1)

        //循环生成哈希映射,HashTableSize大小的数组,数组内存储的是点的id标识
        for(i <- 0 to InputFileCount-1) {
            if (i==0) {
                var start : Int = 0
                var end : Int = HashProbability(1)
                for(j <- start to end) {
                    HashTable(j) = i
                }
            }
            else {
                var start : Int = HashProbability(i-1)
                var end : Int = HashProbability(i)
                for(j <- start to end) {
                    HashTable(j) = i
                }
            }
        }

        println("HashTable(HashTableSize):"+HashTable(HashTableSize))


        val sample_num = (sample_rate*InputFileCount).toInt
        println("sample_num "+sample_num)
        var O2_Array: Array[Double] = new Array[Double](100)
        for(iterator <- 0 to 99)
        {
            //println("the iterator is "+iterator)
            var learningrate = 0.1
            var O_2 = 0.0
            
            //false表示无放回采样 选取预先选定的采样数量
            var sampling = EgdeSet.takeSample(false,sample_num)
            for(i <- 0 to sample_num-1)
            {
                var objective = 0.0
                //println("i is " + i)
                var row:Int = sampling(i)(0).toInt
                var col:Int = sampling(i)(1).toInt
                //println("row:"+row)
                //println("col:"+col)
                var u_j_context = U_context(col,::).t
                var u_j_context_t = U_context(col,::)
                var u_i_vertex = U_vertex(row,::).t
                
                var part1=(-1)*sampling(i)(2)*u_j_context*(1-Sigmoid((u_j_context_t*u_i_vertex).toDouble))
                //println("part1: "+part1)

                //生成0~50000的NS个随机数,用于挑选负采样样本
                var negativeSampleSum = DenseVector.zeros[Double](Dim)
                var RandomSet : List[Int] = RandList(50000,NS)
                //println("RandomSet is:"+RandomSet)
                for(j <- 0 to RandomSet.length-1){
                    //println(RandomSet(j))
                    var u_k_context = U_context(HashTable(RandomSet(j)),::).t
                    var u_k_context_t = U_context(HashTable(RandomSet(j)),::)
                    negativeSampleSum = negativeSampleSum + u_k_context*Sigmoid((u_k_context_t*u_i_vertex).toDouble)
                }
                //println("negativeSampleSum: "+negativeSampleSum)
                
                var part2 = sampling(i)(2)*negativeSampleSum
                //println("part2: "+part2)
                
                var d_O2_ui = part1-part2
                //println("d_O2_ui: "+d_O2_ui)

                //更新u_i
                var tmp1 = u_i_vertex - learningrate*(d_O2_ui)
                //println(tmp1(0)+" "+tmp1(1))

                // println("previous U_context(row,::): "+U_context(row,::))
                for(k1 <- 0 to Dim-1){
                    U_vertex(row,k1) = tmp1(k1)
                }
                //println("after U_context(row,::): "+U_context(row,::))

                var d_O2_uj_context = (-1)*sampling(i)(2)*u_i_vertex*(1-Sigmoid((u_j_context_t*u_i_vertex).toDouble))

                //更新u_j'
                var tmp2 = u_j_context - learningrate*(d_O2_uj_context)
                for(k2 <- 0 to Dim-1){
                    U_context(row,k2) = tmp2(k2)
                }


                //更新u_k'
                var negative_cal = 0.0
                for(j <- 0 to RandomSet.length-1){
                    
                    var u_k_context = U_context(HashTable(RandomSet(j)),::).t
                    var u_k_context_t = U_context(HashTable(RandomSet(j)),::)

                    //这两行用于计算目标函数的值
                    var sigmoid_uk_ui = Sigmoid((u_k_context_t*u_i_vertex).toDouble)
                    negative_cal = negative_cal + math.log(sigmoid_uk_ui)

                    //对u_k'求导
                    var d_O2_uk_context = sampling(i)(2)*u_i_vertex*sigmoid_uk_ui

                    var tmp3 = u_k_context - learningrate*d_O2_uk_context
                    for(k3 <- 0 to Dim-1){
                        U_context(HashTable(RandomSet(j)),k3) = tmp2(k3)
                    }
                    
                }

                //计算误差的变化
                objective = (-1)*sampling(i)(2)*(math.log(Sigmoid((u_j_context_t*u_i_vertex).toDouble)) + negative_cal)
                O_2 = O_2 + objective
            }

            O2_Array(iterator) = O_2
                
        }
        
        
        val U2_HDFS = sc.parallelize(U_vertex.toArray,3)
        val O2_HDFS = sc.parallelize(O2_Array,3)

        //a(::, 2)  
        
        println("======================")
        //println(formZeroToOneRandomMatrix)
        //VertexDegree.saveAsTextFile("file:///usr/local/data/line")
        //IndexSmoothProbability.saveAsTextFile("file:///usr/local/data/line")
        //HashProbability.saveAsTextFile("file:///usr/local/data/line")
        U2_HDFS.saveAsTextFile("file:///usr/local/data/U2")
        O2_HDFS.saveAsTextFile("file:///usr/local/data/O2")
        println("======================")
        sc.stop()
    }
}
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,504评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,434评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,089评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,378评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,472评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,506评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,519评论 3 413
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,292评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,738评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,022评论 2 329
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,194评论 1 342
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,873评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,536评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,162评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,413评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,075评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,080评论 2 352

推荐阅读更多精彩内容