PLASMA库2.8.0与19.8.1编译安装

plasma_2.8.0

该库采用Makefile编译安装,底层我们使用的是openblas
修改make.inc文件如下:

###
#
# @file make.inc.example
#
#  PLASMA is a software package provided by Univ. of Tennessee,
#  Univ. of California Berkeley and Univ. of Colorado Denver
#
# @version 2.8.0
# @author Julie Langou
# @author Mathieu Faverge
# @date 2010-11-15
#
###

# Those variables have to be changed accordingly!
# Compilers, linker/loaders, the archiver, and their options.

# Install directory
prefix    = /opt/local/Nov_plasma

# To speed up the compilation
MAKE      = make -j

CC        = gcc
FC        = gfortran
LOADER    = $(FC)

ARCH      = ar
ARCHFLAGS = cr
RANLIB    = ranlib

CFLAGS    = -O2 -DADD_
FFLAGS    = -O2
LDFLAGS   = -O2 

# To compile Fortran 90 interface
PLASMA_F90 =  1

# To compile Plasma with EZTrace library in order to trace events
# (Don't forget to set correctly PKG_CONFIG_PATH to make `pkg-config --libs eztrace` works)
PLASMA_TRACE= 0

# By sequential kernel
# CFLAGS += -DTRACE_BY_KERNEL

# By parallel plasma function (Works only with dynamic scheduler)
# CFLAGS += -DTRACE_BY_FUNCTION

# Blas Library
LIBBLAS     = -lopenblas
# CBlas library
# LIBCBLAS    = -L/path/to/externallibs/lib -lcblas
# lapack and tmg library (lapack is included in acml)
LIBLAPACK   = -llapack
INCCLAPACK  = 
LIBCLAPACK  = -llapacke

PLASMA任务级并行使用的是自己设计的线程池quark,他要求我们底层的openblas在执行时候必须设置成单线程export OMP_NUM_THREADS=1。这一版本的PLASMA的安装难度不大,基本没有雷。

plasma-19.8.1

这一新版本是为了将quark统一为OMP并行,所以有比较多的坑,整体的代码结构和2.8.0的版本有所不同。

1、Cmake办法找不到OPENBLAS

尝试了在pkg-config目录中写openblas.pc文件,但是没有成功,依然找不到。
然后决定采用Makefile的方法进行自己make编译。
首先,需要使用 python2 执行 configure.py 进行相关依赖库的配置,生成make.inc文件。
最终生成的make.inc文件为:

# PLASMA make.inc template, processed by configure.py
#
# PLASMA is a software package provided by:
# University of Tennessee, US,
# University of Manchester, UK.

# ------------------------------------------------------------------------------
# programs and flags

CC       = gcc
FC       = gfortran
RANLIB   = ranlib
AR       = ar

# Use -fPIC to make shared (.so) and static (.a) libraries;
# can be commented out if making only static libraries.
FPIC     = -fPIC

CFLAGS   = ${FPIC} -std=c99 -fopenmp -DHAVE_OPENMP_DEPEND -DHAVE_OPENMP_PRIORITY -DHAVE_OPENBLAS -DCBLAS_ADD_TYPEDEF -DHAVE_LAPACKE_DLASCL -DHAVE_LAPACKE_DLANTR -DHAVE_LAPACKE_DLASSQ
FCFLAGS  = ${FPIC} @FCFLAGS@ @OPENMP_FCFLAGS@
LDFLAGS  = ${FPIC}  -fopenmp
LIBS     = /home/lsl/OpenBLAS-0.3.9/lib/libopenblas_tsv110_omp-r0.3.9.a -lm -lgfortran
# 原本lib是-lopenblas -lm, 但是这种链接只能设置OMP_NUM_THREADS=1
# Enable or disable compiling Fortran 2008 interfaces into PLASMA library
# 0 - disabled
# 1 - enabled; build Fortran interfaces and examples
fortran ?= 0

# where to install PLASMA
prefix ?= /usr/local/plasma

# one of: aix bsd c89 freebsd generic linux macosx mingw posix solaris
# usually generic is fine
lua_platform ?= generic

2、make lib过程缺脚本

自己到网站上下了一个lua-5.3.6.tar.gz放在tools/目录下解压后,才能继续进行代码编译。(差评,自己缺了模组不提供)

3、make test过程报错

检查发现缺少链接gfortran,在LIBS后补充 -lgfortran后解决。

4、代码性能测试

利用test目录中提供的测试函数进行性能测试,发现只能设置OMP_NUM_THREADS=1,如果开多线程,则报错:

OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.

于是我把-lopenblas换成了绝对路径,连接一个之前编译出来的omp版本openblas静态库,之后顺利解决问题,发现32线程和128线程均有不错的性能~

一套流程耗费一个下午,感慨自己太弱鸡!
还如何针对这两个库做优化?!

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容