[论文笔记]Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

[keywords]: matrix completion, side information, multi-label learning

Matrix completion: low rank, a limited number of observed entries
(1) $\displaystyle\min_{\tilde{M}\in\mathbb{R}^{n\times m}} \parallel\tilde{M}\parallel_{tr}$ s.t. $\mathcal{R}_\Omega(\tilde{M})=\mathcal{M}_\Omega(M)$
Application: collaborative filtering, dimensionality reduction, multi-class learning, clustering, etc.
Side information:
- $A\in \mathbb{R}^{n\times r_a}$ and $B\in \mathbb{R}^{m\times r_b}$ with observed matrix $M\in \mathbb{R}^{n\times m}$ , $\text{rank}(M)=r\leq r_a\ll n, r\leq r_b\ll m$
- e.g. textual description of items, attributes of users, correlation matrix
Assumption:
- $r\ll \min\{n,m\}$
- the target matrix and the side information matrices share the same latent information, i.e. the column and row vectors in $M$ lie in the subspaces spanned by the column and row vectors in $A$ and $B$ (so the optimization problem is reduced to searching for an optimal matrix of size $r_a\times r_b$ ).
Other Approaches and limitations:
- Optimization problem of MC: require to update the full matrix at each iteration
- Matrix Factorization (low rank): cannot guarantee perfect recovery of the target matrix
- With side information and based on graphical models by assuming the special distribution of latent factors: non-convex and no theoretical guarantee on matrix recovery
- MC with infinite dimensional side information: no guarantee of perfect recovery
- MC-b and MC-1: high computational cost
Techniques:
- transductive incomple multi-label learning (aims to assign multiple labels to individual instances in a transductive learning setting)
  $M=AZB^\top$ ( $A$ : feature vectors of instances $B$ : label correlation matrix)
  (2) $\displaystyle\min_{Z\in\mathbb{R}^{r_a\times r_b}} \parallel Z\parallel_{tr}$ s.t. $\mathcal{R}_\Omega(AZB^\top)=\mathcal{M}_\Omega(M)$
- SVT(Singular Vector Thresholding) (approximate (2) by an unconstrained optimization problem)
  (3) $\displaystyle\min_{Z\in\mathbb{R}^{r_a\times r_b}} \mathcal{L}(Z)=\lambda\parallel Z\parallel_{tr} + \frac{1}{2}\parallel\mathcal{R}_\Omega(AZB^\top-M)\parallel^2_F$
- Maxide: Supplementary

Algorithm Maxide (Matrix Completion with Side Information)

Initialization: $\theta_1=\theta_2\in(0,1], Z_1=Z_2, L, \gamma>1$ , and stopping criterion $\epsilon$
$k=2$
while $\mathcal{L}(Z_{k+1})\leq(1-\epsilon)\mathcal{L}(Z_k)$ do
$\quad Y_k=Z_k+\theta_k(\theta^{-1}_{k-1}-1)(Z_k-Z_{k-1})$
$\quad Z_{k+1}=\arg\min_Z \lambda\parallel Z\parallel_{tr}+Q_k(Z)$ #solved by SVT
$\quad$ while $\mathcal{E}(Z_{k+1})-\mathcal{E}(Y_k)\geq H_k(L)$ do
$\qquad L=L*\gamma$
$\qquad Z_{k+1}=\arg\min_Z \lambda\parallel Z\parallel_{tr}+Q_k(Z)$
$\quad$ end while
$\quad \theta_{k+1}=(\sqrt{\theta_k^4+4\theta_k^2}-\theta_k^2)/2$
$\quad k=k+1$
end while

$\mathcal{E}(Z)=\frac{1}{2}\parallel\mathcal{R}_\Omega(AZB^\top-M)\parallel_F^2$
$Q_k(Z)=\frac{L}{2}\parallel Z-(Y_k-\frac{1}{L}A^\top\mathcal{R}_\Omega(AY_kB-M)B)\parallel_F^2$
$H_k(L)=\text{Tr}((Z_{k+1}-Y_k)^\top A^\top\mathcal{R}_\Omega(AY_kB^\top-M))+\frac{L}{2}\parallel Z_{k+1}-Y_k\parallel_F^2$

Experiments (Matlab+C)
- Synthetic dataset
- Real dataset
Advantages:
- required entries: $O(r(n+m)\ln^2 (n+m))$ reduced to $O(r(r_a+r_b)\ln(r_a+r_b)\ln (n+m))$
- convex optimization problem, i.e. unique optimizer
- guarantee on perfect recovery
Disadvantages:
- no linear convergence rate