deepspeed运行大模型时报错:
```python
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f4892b5a020>
Traceback (most recent call last):
File "/home/conda/envs/dsp/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7692a2e020>
Traceback (most recent call last):
File "/home/conda/envs/dsp/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
```
解决路径:
1. 在命令行输入
python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
进行错误复现,查看是否会报错,如果是torch和cuda版本不匹配,会报错:
deepspeed.ops.op_builder.CUDAMismatchException: xxxx
2. 解决方案(两种)
a. 在执行代码前加:DS_SKIP_CUDA_CHCK=1
b. 直接进去错误源码中,改为不校验torch,cuda版本匹配问题