书生大模型-趣味demo

大模型&LLM介绍

参数量大：亿级参数
InternLM是轻量级训练框架，已发布预训练模型InternLM-7B、InternLM-20B。

InternLM-7B

本小节我们将使用 InternStudio 中的 A100(1/4) 机器和 InternLM-Chat-7B 模型部署一个智能对话 Demo。

环境准备

选择英伟达 Cuda11.7 纯净镜像，基于ubuntu预装 Conda

# 创建conda虚拟环境
/root/share/install_conda_env_internlm_base.sh internlm-demo
# 激活conda环境
conda activate internlm-demo
# 升级pip
python -m pip install --upgrade pip
# 安装依赖
pip install modelscope==1.9.5
pip install transformers==4.35.2
pip install streamlit==1.24.0
pip install sentencepiece==0.1.99
pip install accelerate==0.24.1

下载模型

模型大小为 14 GB，下载模型大概需要 10~20 分钟

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='/root/model', revision='v1.0.3')

模型文件列表(pytorch格式)：

(base) root@intern-studio:~# ls /root/model/Shanghai_AI_Laboratory/internlm-chat-7b/ 
README.md                         pytorch_model-00002-of-00008.bin  pytorch_model.bin.index.json
config.json                       pytorch_model-00003-of-00008.bin  special_tokens_map.json
configuration.json                pytorch_model-00004-of-00008.bin  tokenization_internlm.py
configuration_internlm.py         pytorch_model-00005-of-00008.bin  tokenizer.model
generation_config.json            pytorch_model-00006-of-00008.bin  tokenizer_config.json
modeling_internlm.py              pytorch_model-00007-of-00008.bin
pytorch_model-00001-of-00008.bin  pytorch_model-00008-of-00008.bin

查看模型的配置信息：

# cat /root/model/Shanghai_AI_Laboratory/internlm-chat-7b/config.json

config.json文件内容：

{
  "architectures": [
    "InternLMForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "configuration_internlm.InternLMConfig",
    "AutoModel": "modeling_internlm.InternLMForCausalLM",
    "AutoModelForCausalLM": "modeling_internlm.InternLMForCausalLM"
  },
  "bias": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 2048,
  "model_type": "internlm",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "pad_token_id": 2,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.33.2",
  "use_cache": true,
  "vocab_size": 103168
}

命令行demo：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


model_name_or_path = "/root/model/Shanghai_AI_Laboratory/internlm-chat-7b"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')
model = model.eval()

system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
"""

messages = [(system_prompt, '')]

print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")

while True:
    input_text = input("User  >>> ")
    input_text = input_text.replace(' ', '')
    if input_text == "exit":
        break
    response, history = model.chat(tokenizer, input_text, history=messages)
    messages.append((input_text, response))
    print(f"robot >>> {response}")

在终端运行python cli_demo.py 进行对话，输入exit离开。

web-demo

我们切换到 VScode 中，运行 /root/code/InternLM 目录下的 web_demo.py 文件，输入以下命令后，查看本教程5.2配置本地端口后，将端口映射到本地。在本地浏览器输入 http://127.0.0.1:6006 即可。

bash
conda activate internlm-demo  # 首次进入 vscode 会默认是 base 环境，所以首先切换环境
cd /root/code/InternLM
streamlit run web_demo.py --server.address 127.0.0.1 --server.port 6006

Lagent智能体工具调用 Demo

轻量级智能体框架
本小节我们将使用 InternStudio 中的 A100(1/4) 机器、InternLM-Chat-7B 模型和 Lagent 框架部署一个智能工具调用 Demo。

Lagent 是一个轻量级、开源的基于大语言模型的智能体（agent）框架，支持用户快速地将一个大语言模型转变为多种类型的智能体，并提供了一些典型工具为大语言模型赋能。通过 Lagent 框架可以更好的发挥 InternLM 的全部性能。

截屏2024-01-07 19.35.31.png

报错信息

  File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3870, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 743, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 19.99 GiB total capacity; 19.42 GiB already allocated; 36.00 MiB free; 19.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

重新测试，加载运行模型成功（Loading checkpoint shards: 100%|███████████████| 8/8 [00:27<00:00, 3.38s/it]），测试完成demo，此时工作机监控指标如下：
CPU4.02%
内存 2.92 / 56 GB5.21%
GPU: Nvidia A100(1/4)0%
显存 15166 / 20470 MiB74.09%

例子1

已知 2x+3=10，求x。
求解成功。
回答正确。

例子2

已知 2x+3y=10，x + 5y = 20, 求x和y。
求解成功。执行结果: [{x: -10/7, y: 30/7}]
回答错误。根据方程组2x+3y=10和x+5y=20，我们可以使用消元法求解得到x=2，y=4。
要求重新回答仍然错误。
根据方程组2x+3y=10和x+5y=20，我们可以使用消元法求解得到x=5，y=4。

截屏2024-01-07 20.07.57.png

图文demo

浦语·灵笔图文理解创作 Demo
本小节我们将使用 InternStudio 中的 A100(1/4) * 2 机器和 internlm-xcomposer-7b 模型部署一个图文理解创作 Demo 。

InternLM-Xcomposer-7B模型

环境配置

pip换源

conda换源

模型下载

三种方式
huggingface-cli
OpenXlab
Modelscope

实践

A100数据中心显卡

书生大模型-趣味demo