背景
部署了很多LLM模型,接口很多,特别是内网硬件和模型经常变化,有时候不得不改LLM的url,所以通过统一的入口方便维护配置。
litellm proxy是一个非常轻量化的方案,改改配置文件即可添加模型。官方链接: https://docs.litellm.ai/docs/proxy/docker_quick_start
同类对比
对比了oneapi、nginx、litellm,oneapi有点繁杂,nginx无法获取模型列表,所以选了litellm。
安装方法
docker pull ghcr.io/berriai/litellm:main-latest
配置文件
# https://docs.litellm.ai/docs/proxy/configs
# model 前缀 openai(api_base后需要加v1) hosted_vllm(vllm启动rerank模型,且api_base后不加v1) ollama(api_base后不加v1)
model_list:
- model_name: qwen3:32b ### RECEIVED MODEL NAME ###
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: ollama/qwen3:32b ### MODEL NAME sent to `litellm.completion()` ###
api_base: http://192.168.1.252:11434
# api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
api_key: none
# rpm: 6 # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
- model_name: bge-m3:latest
litellm_params:
model: openai/bge-m3:latest
api_base: http://192.168.1.252:11434/v1
api_key: none
- model_name: bge-reranker-v2-m3
litellm_params:
model: hosted_vllm/bge-reranker-v2-m3
api_base: http://192.168.1.213:11436
api_key: none
- model_name: Qwen3-Embedding-8B
litellm_params:
model: openai/Qwen3-Embedding-8B
api_base: http://192.168.1.252:11435/v1
api_key: none
# 其他模型转发到这里
- model_name: "*"
litellm_params:
model: ollama/*
api_base: http://192.168.1.213:11434
api_key: none
litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
drop_params: True
# success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env
general_settings:
# master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
# alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env
运行方法
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug
常见的坑
注意model需要加前缀,目前遇到的主要有三种
- openai
vllm和ollama的chat模型和embedding模型前缀为openai,api_base末尾必须带v1, - ollama
ollama也可以ollama为前缀,api_base末尾不能带v1 - hosted_vllm
vllm下的rerank模型前缀,api_base末尾不能带v1