使用 Ollama 运行本地 deepseek 模型

ollama 的官网，https://ollama.com/download/linux ，执行 curl -fsSL https://ollama.com/install.sh | sh 安装 ollama 工具

root@gpu-01:~# curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.
root@gpu-01:~# systemctl status ollama.service 
● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2025-02-20 17:28:09 CST; 42s ago
   Main PID: 2587049 (ollama)
      Tasks: 16 (limit: 629145)
     Memory: 30.3M
        CPU: 1.187s
     CGroup: /system.slice/ollama.service
             └─2587049 /usr/local/bin/ollama serve

Feb 20 17:28:09 gpu-01 ollama[2587049]: time=2025-02-20T17:28:09.479+08:00 level=INFO source=routes.go:1237 msg="Listening on 127.0.0.1:11434 (version 0.5.11)"
Feb 20 17:28:09 gpu-01 ollama[2587049]: time=2025-02-20T17:28:09.480+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-a6c24381-350f-1bbf-ddcb-73134a32f102 library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-b8d55e89-ad1f-174f-e0b1-3a1f8f2de308 library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d379c82c-0587-adde-abde-457b161eea9a library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-300527ee-a553-e108-8f70-7255143c77e8 library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-f0eef8c7-621a-6bbf-39d3-9258c5653ddb library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-604933dd-5b25-190e-3e3a-1371d6f0b745 library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-e76e16fd-2ced-a768-1371-8203afd42b36 library=cuda variant=v12 compute=8.9 driver=12.4 name>
Feb 20 17:28:10 gpu-01 ollama[2587049]: time=2025-02-20T17:28:10.639+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-0b6b5e0c-994d-1d6a-378c-ef015bf9029d library=cuda variant=v12 compute=8.9 driver=12.4 name>
root@gpu-01:~# ollama -v
ollama version is 0.5.11

更改模型默认存放路径到其他目录

root@gpu-01:~# systemctl stop ollama.service 
root@gpu-01:~# cd /usr/local/share/
root@gpu-01:/usr/share# mv ollama/ /nvme1/
root@gpu-01:/usr/share# ln -s /nvme1/ollama/ .
root@gpu-01:~# systemctl start ollama.service

下载运行模型

root@gpu-01:~# ollama run llama3.3
pulling manifest 
pulling 4824460d29f2... 100% ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  42 GB                         
pulling 948af2743fc7... 100% ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 1.5 KB                         
pulling bc371a43ce90... 100% ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 7.6 KB                         
pulling 53a87df39647... 100% ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 5.6 KB                         
pulling 56bb8bd477a5... 100% ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????   96 B                         
pulling c7091aa45e9b... 100% ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  562 B                         
verifying sha256 digest 
writing manifest 
success 
>>> 你好
很高兴见到你！今天我能为您做什么？

>>> 模型推理有哪些工具？
模型推理是一个用于分析和解释机器学习（ML）模型的过程。以下是一些流行的工具：

1. **LIME（局部可解释模型-agnostic解释）**：一个开源库，用于生成可解释的模型，以帮助理解预测。
2. **SHAP（SHapley 加值解释）**：另一个开源库，它将每个特征对预测结果的贡献分配给一个值，称为SHAP值。
3. **TreeExplainer**：一个用于树基模型（如随机森林或梯度提升）的模型解释工具。
4. **Anchor**：一种模型推理技术，它提供了可解释的“锚点”，以帮助理解预测。
5. **Model Interpretability** 由 **H2O.ai** 提供：一个库，用于解释 H2O.ai 机器学习模型中的结果，包括部分依赖图和特征重要性。
6. **TensorFlow Model Analysis**：一个 TensorFlow 工具，用于分析和可视化机器学习模型的性能和偏差。
7. **Scikit-learn 的 permutation_importance**：Scikit-learn 库中用于计算每个功能对模型预测结果贡献的工具。
8. **ELI5（第五层解释）**：一个库，用于可视化和解释 Scikit-learn 和其他机器学习模型中的特征重要性和部分依赖图。
9. **ModelTree**：一种技术，用于将复杂的机器学习模型分解为一组树基模型，使其更容易解释。
10. **Google 的 What-If 工具**：一个基于网络的工具，用于可视化和分析机器学习模型的性能、偏差和公平性。
11. **IBM 的 AI Explainability 360**：一个开源库，用于生成和评估机器学习模型中的解释。
12. **Microsoft 的 Interpret-ML**：一个库，用于解释机器学习模型并提供模型可解释性的见解。

这些只是可用于模型推理的许多工具中的几个例子。选择合适的工具取决于具体用例、所使用的机器学习框架以及需要分析的模型类型。

查看模型

// 命令行直接调用
root@gpu-01:~# ollama list
NAME               ID              SIZE     MODIFIED    
llama3.3:latest    a6eb4748fd29    42 GB    2 hours ago    

// 通过 ollama serve api 调用
root@gpu-01:~# curl -s http://localhost:11434/api/tags | jq
{
  "models": [
    {
      "name": "llama3.3:latest",
      "model": "llama3.3:latest",
      "modified_at": "2025-02-21T11:14:42.543258402+08:00",
      "size": 42520413916,
      "digest": "a6eb4748fd2990ad2952b2335a95a7f952d1a06119a0aa6a2df6cd052a93a3fa",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "70.6B",
        "quantization_level": "Q4_K_M"
      }
    }
  ]
}

通过 python 与 ollama api 交互

root@gpu-01:~# pip install ollama
root@gpu-01:~# cat demo.py 
from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='llama3.3:latest', messages=[
  {
    'role': 'user',
    'content': '说明下时间复杂度？',
  },
])
print(response['message']['content'])

root@gpu-01:~# python3 demo.py 
时间复杂度是指算法的运行时间与输入数据大小之间的关系。通常用“大O”符号表示，记为T(n) = O(f(n))，其中n是输入数据的大小，f(n)是函数，它描述了算法的运行时间随着输入数据大小的增长而变化的趋势。

常见的时间复杂度有：

1. **O(1)**：常数时间复杂度，表示算法的运行时间与输入数据大小无关。
2. **O(log n)**：对数时间复杂度，表示算法的运行时间随着输入数据大小的增长而呈现对数级别的增加。
3. **O(n)**：线性时间复杂度，表示算法的运行时间随着输入数据大小的增长而呈现线性级别的增加。
4. **O(n log n)**：线性对数时间复杂度，表示算法的运行时间随着输入数据大小的增长而呈现线性对数级别的增加。
5. **O(n^2)**：平方时间复杂度，表示算法的运行时间随着输入数据大小的增长而呈现平方级别的增加。
6. **O(2^n)**：指数时间复杂度，表示算法的运行时间随着输入数据大小的增长而呈现指数级别的增加。

通常，我们希望算法的时间复杂度尽可能低，以保证其能够高效地处理大规模的输入数据。

例如：

* 二分查找算法的时间复杂度为O(log n)，因为它每次可以将搜索空间减半。
* 冒泡排序算法的时间复杂度为O(n^2)，因为它需要比较每一对元素。
* 快速排序算法的平均时间复杂度为O(n log n)，但在最坏情况下可能达到O(n^2)。

总之，理解时间复杂度有助于我们评估算法的效率和可扩展性，并指导我们选择合适的算法来解决实际问题。

查看模型的显存占用和停止运行中的模型

root@gpu-01:~# nvidia-smi 
Fri Feb 21 13:42:28 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:16:00.0 Off |                  Off |
| 31%   35C    P8             16W /  450W |    7678MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        On  |   00000000:6A:00.0 Off |                  Off |
| 31%   35C    P8              7W /  450W |    6774MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        On  |   00000000:94:00.0 Off |                  Off |
| 31%   36C    P8             28W /  450W |    6832MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        On  |   00000000:BE:00.0 Off |                  Off |
| 31%   36C    P8             22W /  450W |    6774MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA GeForce RTX 4090        On  |   00000001:16:00.0 Off |                  Off |
| 31%   34C    P8              8W /  450W |    6774MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA GeForce RTX 4090        On  |   00000001:6A:00.0 Off |                  Off |
| 31%   35C    P8             19W /  450W |    6832MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA GeForce RTX 4090        On  |   00000001:94:00.0 Off |                  Off |
| 30%   36C    P8             18W /  450W |    6832MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA GeForce RTX 4090        On  |   00000001:BE:00.0 Off |                  Off |
| 31%   35C    P8             16W /  450W |    7452MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    375637      C   /usr/local/bin/ollama                        7670MiB |
|    1   N/A  N/A    375637      C   /usr/local/bin/ollama                        6766MiB |
|    2   N/A  N/A    375637      C   /usr/local/bin/ollama                        6824MiB |
|    3   N/A  N/A    375637      C   /usr/local/bin/ollama                        6766MiB |
|    4   N/A  N/A    375637      C   /usr/local/bin/ollama                        6766MiB |
|    5   N/A  N/A    375637      C   /usr/local/bin/ollama                        6824MiB |
|    6   N/A  N/A    375637      C   /usr/local/bin/ollama                        6824MiB |
|    7   N/A  N/A    375637      C   /usr/local/bin/ollama                        7444MiB |
+-----------------------------------------------------------------------------------------+


root@gpu-01:~# ollama stop llama3.3:latest

deepseek-r1:70b 模型使用

root@gpu-01:~# ollama pull deepseek-r1:70b
pulling manifest 
pulling 4cd576d9aa16... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  42 GB                         
pulling 369ca498f347... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  387 B                         
pulling 6e4c38e1172f... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB                         
pulling f4d24e9138dd... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  148 B                         
pulling 5e9a45d7d8b9... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  488 B                         
verifying sha256 digest 
writing manifest 
success 
root@gpu-01:~# ollama list
NAME               ID              SIZE     MODIFIED      
deepseek-r1:70b    0c1615a8ca32    42 GB    2 minutes ago    
llama3.3:latest    a6eb4748fd29    42 GB    3 hours ago      

root@gpu-01:~# ollama run deepseek-r1:70b
>>> udp有没有拥塞控制？
<think>

</think>

UDP（用户数据报协议）本身不具有拥塞控制机制。以下是详细原因及相关信息：

### 1. **UDP 的特点**
- **无连接**: UDP 是无连接的协议，数据报的发送和接收不需要建立连接。
- **尽力而为_delivery**: UDP 不保证数据报的可靠递送，可能会出现丢包、乱序等情况。
- **轻量级**: UDP 的头部较小，只有8个字节，处理开销低。

### 2. **拥塞控制的缺失**
- **无内置机制**: UDP 协议本身不包含拥塞控制的机制，因此发送方不会根据网络状况调整数据传输速率。
- **可能导致网络拥塞**: 如果发送方以高速度持续发送数据，可能会导致网络带宽被耗尽，尤其在网络环境复杂或带宽有限的情况下。

### 3. **应用场景**
尽管 UDP 缺乏拥塞控制，但它在某些场景下仍然是合适的选择：
- **实时性要求高**: 如视频会议、在线游戏等，对延迟敏感但可容忍部分数据丢失。
- **广播或多播**: UDP 支持一对多或多对多的通信，适用于流媒体和在线直播。

### 4. **解决方案**
如果需要在 UDP 通信中实现拥塞控制，可以采取以下措施：
- **应用层实现**: 在应用程序中手动控制发送速率，根据反馈调整数据传输速度。
- **使用其他协议**: 如果需要可靠传输和拥塞控制，可以选择 TCP 或其他支持拥塞控制的协议。

### 总结
UDP 不具备内置的拥塞控制机制，适用于对实时性要求高但可容忍数据丢失的场景。对于需要可靠传输和拥塞控制的应用，建议使用 TCP 或在应用层实现相关功能。

使用 Ollama 运行本地 deepseek 模型

推荐阅读更多精彩内容