plasma_2.8.0
该库采用Makefile编译安装,底层我们使用的是openblas
修改make.inc文件如下:
###
#
# @file make.inc.example
#
# PLASMA is a software package provided by Univ. of Tennessee,
# Univ. of California Berkeley and Univ. of Colorado Denver
#
# @version 2.8.0
# @author Julie Langou
# @author Mathieu Faverge
# @date 2010-11-15
#
###
# Those variables have to be changed accordingly!
# Compilers, linker/loaders, the archiver, and their options.
# Install directory
prefix = /opt/local/Nov_plasma
# To speed up the compilation
MAKE = make -j
CC = gcc
FC = gfortran
LOADER = $(FC)
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
CFLAGS = -O2 -DADD_
FFLAGS = -O2
LDFLAGS = -O2
# To compile Fortran 90 interface
PLASMA_F90 = 1
# To compile Plasma with EZTrace library in order to trace events
# (Don't forget to set correctly PKG_CONFIG_PATH to make `pkg-config --libs eztrace` works)
PLASMA_TRACE= 0
# By sequential kernel
# CFLAGS += -DTRACE_BY_KERNEL
# By parallel plasma function (Works only with dynamic scheduler)
# CFLAGS += -DTRACE_BY_FUNCTION
# Blas Library
LIBBLAS = -lopenblas
# CBlas library
# LIBCBLAS = -L/path/to/externallibs/lib -lcblas
# lapack and tmg library (lapack is included in acml)
LIBLAPACK = -llapack
INCCLAPACK =
LIBCLAPACK = -llapacke
PLASMA任务级并行使用的是自己设计的线程池quark,他要求我们底层的openblas在执行时候必须设置成单线程export OMP_NUM_THREADS=1。这一版本的PLASMA的安装难度不大,基本没有雷。
plasma-19.8.1
这一新版本是为了将quark统一为OMP并行,所以有比较多的坑,整体的代码结构和2.8.0的版本有所不同。
1、Cmake办法找不到OPENBLAS
尝试了在pkg-config目录中写openblas.pc文件,但是没有成功,依然找不到。
然后决定采用Makefile的方法进行自己make编译。
首先,需要使用 python2 执行 configure.py 进行相关依赖库的配置,生成make.inc文件。
最终生成的make.inc文件为:
# PLASMA make.inc template, processed by configure.py
#
# PLASMA is a software package provided by:
# University of Tennessee, US,
# University of Manchester, UK.
# ------------------------------------------------------------------------------
# programs and flags
CC = gcc
FC = gfortran
RANLIB = ranlib
AR = ar
# Use -fPIC to make shared (.so) and static (.a) libraries;
# can be commented out if making only static libraries.
FPIC = -fPIC
CFLAGS = ${FPIC} -std=c99 -fopenmp -DHAVE_OPENMP_DEPEND -DHAVE_OPENMP_PRIORITY -DHAVE_OPENBLAS -DCBLAS_ADD_TYPEDEF -DHAVE_LAPACKE_DLASCL -DHAVE_LAPACKE_DLANTR -DHAVE_LAPACKE_DLASSQ
FCFLAGS = ${FPIC} @FCFLAGS@ @OPENMP_FCFLAGS@
LDFLAGS = ${FPIC} -fopenmp
LIBS = /home/lsl/OpenBLAS-0.3.9/lib/libopenblas_tsv110_omp-r0.3.9.a -lm -lgfortran
# 原本lib是-lopenblas -lm, 但是这种链接只能设置OMP_NUM_THREADS=1
# Enable or disable compiling Fortran 2008 interfaces into PLASMA library
# 0 - disabled
# 1 - enabled; build Fortran interfaces and examples
fortran ?= 0
# where to install PLASMA
prefix ?= /usr/local/plasma
# one of: aix bsd c89 freebsd generic linux macosx mingw posix solaris
# usually generic is fine
lua_platform ?= generic
2、make lib过程缺脚本
自己到网站上下了一个lua-5.3.6.tar.gz放在tools/目录下解压后,才能继续进行代码编译。(差评,自己缺了模组不提供)
3、make test过程报错
检查发现缺少链接gfortran,在LIBS后补充 -lgfortran后解决。
4、代码性能测试
利用test目录中提供的测试函数进行性能测试,发现只能设置OMP_NUM_THREADS=1,如果开多线程,则报错:
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.
于是我把-lopenblas换成了绝对路径,连接一个之前编译出来的omp版本openblas静态库,之后顺利解决问题,发现32线程和128线程均有不错的性能~
一套流程耗费一个下午,感慨自己太弱鸡!
还如何针对这两个库做优化?!