代码仓地址:
https://github.com/vetter/shoc
运行环境:
CPU:Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
GPU:NVIDIA GeForce RTX 3090
OS:Ubuntu 20.04.6 LTS
Kernel:5.15.0-56-generic
Compiler:gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)
Cuda:release 11.6, V11.6.55
Make:GNU Make 4.2.1
下载编译安装:
cd shoc/
mkdir build
cd build
sh ../configure
CPPFLAGS="-I/usr/local/cuda/include"
LIBS="-lcuda -lcudart"
LDFLAGS="-L/usr/local/cuda/lib64"
编译错误1:
Unsupported gpu architecture 'compute_30'
Making all in cuda
make[2]: Entering directory '/home/ryy/benchmark/shoc/build/src/cuda'
Making all in level0
make[3]: Entering directory '/home/ryy/benchmark/shoc/build/src/cuda/level0'
Making all in epmpi
make[4]: Entering directory '/home/ryy/benchmark/shoc/build/src/cuda/level0/epmpi'
/usr/local/bin/mpicxx -DHAVE_CONFIG_H -I. -I../../../../../src/cuda/level0/epmpi -I../../../../config -I/usr/local/cuda/inclu de -I../../../../../src/cuda/common -DPARALLEL -I../../../../../src/mpi/common -I/usr/local/cuda/include -I../../../../../src/ common -I../../../../config -g -O2 -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o ../../../../../src/cuda/level0/epmpi/.. /../common/main.cpp
mv -f .deps/main.Tpo .deps/main.Po
/usr/local/cuda/bin/nvcc -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_32,code=sm_32 -gencode=arch=compute_35,code =sm_35 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=ar ch=compute_53,code=sm_53 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_62,code =sm_62 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=ar ch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -I../../../../../src/cuda/co mmon -I/usr/local/cuda/include -I../../../../../src/common -I../../../../config -g -O2 -c ../../../../../src/cuda/level0/epmpi /../BusSpeedDownload.cu
nvcc fatal : Unsupported gpu architecture 'compute_30'
make[4]: *** [../../../../config/targets.mk:24: BusSpeedDownload.o] Error 1
make[4]: Leaving directory '/home/ryy/benchmark/shoc/build/src/cuda/level0/epmpi'
make[3]: *** [Makefile:370: all-recursive] Error 1
make[3]: Leaving directory '/home/ryy/benchmark/shoc/build/src/cuda/level0'
make[2]: *** [Makefile:250: all-recursive] Error 1
make[2]: Leaving directory '/home/ryy/benchmark/shoc/build/src/cuda'
make[1]: *** [Makefile:254: all-recursive] Error 1
make[1]: Leaving directory '/home/ryy/benchmark/shoc/build/src'
make: *** [Makefile:299: all-recursive] Error 1
编译错误2:
Undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/usr/local/bin/mpicxx -g -O2 -L../../../../src/opencl/common -L../../../../src/mpi/common -L/usr/local/cuda/lib64 -L../../../ ../src/common -o MTBusCont OCLDriver.o MTBusCont.o mtbcmain.o ../../../../src/opencl/level0/BusSpeedDownload.o -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lcuda -lcudart -lrt -lcuda -lcudart -lrt
/usr/bin/ld: mtbcmain.o: undefined reference to symbol 'pthread_join@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:279: MTBusCont] Error 1
make[4]: Leaving directory '/home/ryy/benchmark/shoc/build/src/mpi/contention-mt/opencl'
make[3]: *** [Makefile:252: all-recursive] Error 1
make[3]: Leaving directory '/home/ryy/benchmark/shoc/build/src/mpi/contention-mt'
make[2]: *** [Makefile:250: all-recursive] Error 1
make[2]: Leaving directory '/home/ryy/benchmark/shoc/build/src/mpi'
make[1]: *** [Makefile:254: all-recursive] Error 1
make[1]: Leaving directory '/home/ryy/benchmark/shoc/build/src'
make: *** [Makefile:299: all-recursive] Error 1
解决方法:
指定CUDA_CPPFLAGS,在LDFLAGS中添加pthred库的位置和-pthread
sh ../configure
CPPFLAGS="-I/usr/local/cuda/include"
CUDA_CPPFLAGS="-I/usr/local/cuda/include"
LIBS="-lcuda -lcudart"
LDFLAGS="-L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu/ -pthread"
运行结果:
ryy@amax:~/benchmark/shoc/build/bin$ ./shocdriver -s 1 -d 1 -cuda
--- Welcome To The SHOC Benchmark Suite version 1.1.5 ---
Hostname: amax
Platform selection not specified, default to platform #0
Number of available platforms: 1
Number of available devices on platform 0 : 1
Device 0: 'NVIDIA GeForce RTX 3090'
Specified 1 device IDs: 1
Using size class: 1
--- Starting Benchmarks ---
Running benchmark BusSpeedDownload
result for bspeed_download: 12.3484 GB/sec
Running benchmark BusSpeedReadback
result for bspeed_readback: 13.2010 GB/sec
Running benchmark MaxFlops
result for maxspflops: 36674.7000 GFLOPS
result for maxdpflops: 636.9900 GFLOPS
Running benchmark DeviceMemory
result for gmem_readbw: 891.1060 GB/s
result for gmem_readbw_strided: 333.5060 GB/s
result for gmem_writebw: 793.3380 GB/s
result for gmem_writebw_strided: 139.7440 GB/s
result for lmem_readbw: 8522.5300 GB/s
result for lmem_writebw: 8545.6500 GB/s
result for tex_readbw: 2013.6800 GB/sec
Skipping non-cuda benchmark KernelCompile
Skipping non-cuda benchmark QueueDelay
Running benchmark BFS
result for bfs: 0.1659 GB/s
result for bfs_pcie: 0.1154 GB/s
result for bfs_teps: 3713090.0000 Edges/s