文章:
[1] Bell N, Garland M. Efficient sparse matrix-vector multiplication on CUDA[R]. Nvidia Technical Report NVR-2008-004, Nvidia Corporation, 2008.
[2] Bell N, Garland M. Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]//Proceedings of the conference on high performance computing networking, storage and analysis. ACM, 2009: 18.
[3] Saad Y. Iterative methods for sparse linear systems[M]. siam, 2003.
相关链接:https://www.bu.edu/pasi/materials/,里面有Iterative methods for sparse linear systems on GPU的PPT
机构:NVIDIA
[TOC]
1. Sparse Matrix
大量0元素。
2. Sparse Representations (Static Storage Formats)
2.1. Coordinate (COO)
三个等长数组,表示简单,易于操作。
2.2. Compressed Sparse Row (CSR)
- COO的row indices有很多重复值:
-- 记录每一行结束时总共的非0元素个数
-- 如,想知道第3行的元素,在row offsets中得到4、7,再依次访问c[4] - c[7]、v[4] - v[7]
-- 该格式比较Popular
2.3. ELLPACK (ELL)
- 假设:每行非0元素个数有限,且最大为N
-- #rows x N 空间存储values,每行依次填充
-- #rows x N 空间存储col indices,与values矩阵一一对应
-- 适用于每行非0元素个数都较少的情况
2.4. Diagonal (DIA)
- 假设:矩阵是对角紧凑型的
-- 沿对角线存储数值
-- 可根据 offset来确定每一条对角线
2.5. Hybrid (HYB) ELL + COO
- 假设:有若干行的非0元素个数较多,其余均较少
-- 大部分用ELL,剩余的用COO
2.6. Else
- Block Compressed Sparse Row (BSR)
-- SLAM_LocalBA - Skyline
-- well suited for Cholesky or LU decomposition when no pivoting is required - ……
了解更多:
[1] https://software.intel.com/en-us/mkl-developer-reference-c-sparse-matrix-storage-formats
2.6. Format Comparison
优劣与形状、算法有关
3. Sparse Solvers
3.1. Direct Solvers
- 求解步骤
-- 求解:
-- LU分解:
-- Forward substitute:
-- Back substitute: - Sparse matrix factorization
-- Try to reduce fill-in of factors
-- Use sophisticated reodering schemes: NP
-- LU factorization
-- Cholesky factorization (Symmetric Positive) - Popular codes
-- PARDISO, UMFPACK, SuperLU
3.2. Iterative Solvers
- Sequence of approximate solutions
-- Converge to exact solution(会多次运算Ax) - Measure error through residual
--
--
-- Stop when - Wide variety of methods
-- Jacobi
-- Richardson
-- Gauss-Sidel
-- Conjugate-Gradient (CG)
-- Generalized Minimum-Residual (GMRES)
3.2.1. Jacobi Iterative Method
3.2.2. Krylov Iterative Methods
还没搞明白。
3.3. Solvers Comparison
Direct Solvers | Iterative Solvers |
---|---|
Robust | Breakdown issues |
Black-box operation: Just USE it! | Lots of parameters |
Difficult to parallelize | Amenable to parallelism |
Memory consumption: factorization | Low memory footprint |
Limited scalability | Scalable |
4. Sparse Matrix-Vector Multiplication (SpMV)
就是各种GPU算子kernel,详见paper吧。