[论文笔记]Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

[keywords]: matrix completion, side information, multi-label learning

  • Matrix completion: low rank, a limited number of observed entries
    (1) \displaystyle\min_{\tilde{M}\in\mathbb{R}^{n\times m}} \parallel\tilde{M}\parallel_{tr} s.t. \mathcal{R}_\Omega(\tilde{M})=\mathcal{M}_\Omega(M)

  • Application: collaborative filtering, dimensionality reduction, multi-class learning, clustering, etc.

  • Side information:

    • A\in \mathbb{R}^{n\times r_a} and B\in \mathbb{R}^{m\times r_b} with observed matrix M\in \mathbb{R}^{n\times m}, \text{rank}(M)=r\leq r_a\ll n, r\leq r_b\ll m
    • e.g. textual description of items, attributes of users, correlation matrix
  • Assumption:

    • r\ll \min\{n,m\}
    • the target matrix and the side information matrices share the same latent information, i.e. the column and row vectors in M lie in the subspaces spanned by the column and row vectors in A and B (so the optimization problem is reduced to searching for an optimal matrix of size r_a\times r_b).
  • Other Approaches and limitations:

    • Optimization problem of MC: require to update the full matrix at each iteration
    • Matrix Factorization (low rank): cannot guarantee perfect recovery of the target matrix
    • With side information and based on graphical models by assuming the special distribution of latent factors: non-convex and no theoretical guarantee on matrix recovery
    • MC with infinite dimensional side information: no guarantee of perfect recovery
    • MC-b and MC-1: high computational cost
  • Techniques:

    • transductive incomple multi-label learning (aims to assign multiple labels to individual instances in a transductive learning setting)
      M=AZB^\top (A: feature vectors of instances B: label correlation matrix)
      (2) \displaystyle\min_{Z\in\mathbb{R}^{r_a\times r_b}} \parallel Z\parallel_{tr} s.t. \mathcal{R}_\Omega(AZB^\top)=\mathcal{M}_\Omega(M)
    • SVT(Singular Vector Thresholding) (approximate (2) by an unconstrained optimization problem)
      (3) \displaystyle\min_{Z\in\mathbb{R}^{r_a\times r_b}} \mathcal{L}(Z)=\lambda\parallel Z\parallel_{tr} + \frac{1}{2}\parallel\mathcal{R}_\Omega(AZB^\top-M)\parallel^2_F
    • Maxide: Supplementary

Algorithm Maxide (Matrix Completion with Side Information)

  1. Initialization: \theta_1=\theta_2\in(0,1], Z_1=Z_2, L, \gamma>1, and stopping criterion \epsilon
  2. k=2
  3. while \mathcal{L}(Z_{k+1})\leq(1-\epsilon)\mathcal{L}(Z_k) do
  4. \quad Y_k=Z_k+\theta_k(\theta^{-1}_{k-1}-1)(Z_k-Z_{k-1})
  5. \quad Z_{k+1}=\arg\min_Z \lambda\parallel Z\parallel_{tr}+Q_k(Z) #solved by SVT
  6. \quadwhile \mathcal{E}(Z_{k+1})-\mathcal{E}(Y_k)\geq H_k(L) do
  7. \qquad L=L*\gamma
  8. \qquad Z_{k+1}=\arg\min_Z \lambda\parallel Z\parallel_{tr}+Q_k(Z)
  9. \quadend while
  10. \quad \theta_{k+1}=(\sqrt{\theta_k^4+4\theta_k^2}-\theta_k^2)/2
  11. \quad k=k+1
  12. end while

\mathcal{E}(Z)=\frac{1}{2}\parallel\mathcal{R}_\Omega(AZB^\top-M)\parallel_F^2
Q_k(Z)=\frac{L}{2}\parallel Z-(Y_k-\frac{1}{L}A^\top\mathcal{R}_\Omega(AY_kB-M)B)\parallel_F^2
H_k(L)=\text{Tr}((Z_{k+1}-Y_k)^\top A^\top\mathcal{R}_\Omega(AY_kB^\top-M))+\frac{L}{2}\parallel Z_{k+1}-Y_k\parallel_F^2


  • Experiments (Matlab+C)

    • Synthetic dataset
    • Real dataset
  • Advantages:

    • required entries: O(r(n+m)\ln^2 (n+m)) reduced to O(r(r_a+r_b)\ln(r_a+r_b)\ln (n+m))
    • convex optimization problem, i.e. unique optimizer
    • guarantee on perfect recovery
  • Disadvantages:

    • no linear convergence rate
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。