容器切换不同用户启动后,mpi4py使用异常

问题记录

镜像内部安装了mpi4py,通过容器启动后,执行如下命令

python3 -c "from mpi4py import MPI"

提示如下错误:

--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    opal_init:startup:internal-failure
But I couldn't open the help file:
    /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/help-opal-runtime.txt: No such file or directory.  Sorry!
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    orte_init:startup:internal-failure
But I couldn't open the help file:
    /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/help-orte-runtime: No such file or directory.  Sorry!
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    mpi_init:startup:internal-failure
But I couldn't open the help file:
    /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/help-mpi-runtime.txt: No such file or directory.  Sorry!
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

此类错误往往是环境变量改变导致,如果环境变量是依附镜像,在起容器时,用不同用户会出现无对应环境变量情况,所在在导包后会提示相关文件不存在。

解决办法

添加对应环境变量

export LD_LIBRARY_PATH=/usr/local/mpi/lib:LD_LIBRARY_PATH
export OPAL_PREFIX=/opt/hpcx/ompi/

此类问题,最好的解决办法是把相关环境变量配置到镜像内部的 ~/.bashrc 文件中

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容