目录
- 后续文章
- 整体结构
- 需要修改部分
- genie-tool项目
- genie-backend项目
- 配置修改
- genie-backend
- genie-tool
- 整体启动
后续文章
整体结构

整体结构.png
-
genie-backend对应后端,内置工具就是genie-tool, mcp工具外部调用就是genie-client
结构参考参考文章的图.png
需要修改部分
- 执行以下修改之后该项目在windows正常使用中文回答展示
genie-tool项目
- deepsearch.py类中: search_reasoning前面必须加await, 否则跑不起来
# 推理验证是否需要继续搜索
reasoning_result = await search_reasoning(
request_id=request_id,
query=query,
content=self.search_docs_str(os.getenv("SEARCH_REASONING_MODEL")),
)
- code_interpreter.py中增加读取文件自适应编码
import chardet
def detect_encoding(file_path: str) -> str:
"""自动检测文件编码"""
with open(file_path, 'rb') as f:
result = chardet.detect(f.read(10000))
return result['encoding'] or 'utf-8'
然后在code_interpreter_agent方法中60几行增加编码过滤
encoding = detect_encoding(file_path)
df = (
pd.read_csv(file_path, encoding=encoding)
if file_name.endswith(".csv")
else pd.read_excel(file_path)
)
- file_table_op.py加特殊符号过滤以兼容windows特殊字符
_FileDB类中增加:
# 非法字符正则表达式
_ILLEGAL_CHARS_PATTERN = re.compile(r'[\\/:*?"<>|]')
@classmethod
def _sanitize_path_component(cls, component: str) -> str:
"""
清理路径组件(文件名/目录名),移除Windows非法字符
"""
# 替换非法字符为下划线
sanitized = cls._ILLEGAL_CHARS_PATTERN.sub('_', component)
# 去除首尾空格和点(Windows不允许文件名以点结尾)
sanitized = sanitized.strip().strip('.')
# 确保不是空字符串
return sanitized if sanitized else "unnamed"
save方法重写如下:
async def save(self, file_name, content, scope) -> str:
if "." in file_name:
file_name = os.path.basename(file_name)
else:
file_name = f"{file_name}.txt"
clean_scope = self._sanitize_path_component(scope)
save_path = os.path.join(self._work_dir, clean_scope)
if not os.path.exists(save_path):
os.makedirs(save_path)
with open(f"{save_path}/{file_name}", "w", encoding="utf-8") as f:
f.write(content)
return f"{save_path}/{file_name}"
- file_util中重构get_file_content方法读取写入文件
async def get_file_content(file_name: str) -> str:
# local file
if file_name.startswith("/"):
with open(file_name, "r") as rf:
return rf.read()
# file server
else:
b_content = b""
async with aiohttp.ClientSession() as session:
async with session.get(file_name, timeout=10) as response:
while True:
chunk = await response.content.read(1024)
if not chunk:
break
b_content += chunk
try:
# 1. 优先尝试UTF-8
return b_content.decode("utf-8")
except UnicodeDecodeError:
# 2. 自动检测编码
detected = chardet.detect(b_content)
encoding = detected["encoding"] or "gbk"
try:
return b_content.decode(encoding)
except (UnicodeDecodeError, LookupError):
# 3. 最终fallback:忽略错误字符
return b_content.decode("gbk", errors="replace")
- 我用的是qwen模型,openai格式的所以需要在llm_util调用大模型时增加 custom_llm_provider="openai"才可以正常使用,我的model='qwen-plus'
response = await acompletion(
messages=messages,
model=model,
temperature=temperature,
top_p=top_p,
stream=stream,
custom_llm_provider="openai",
extra_headers=extra_headers,
**kwargs
)
- prompt_util.py增加指定utf8
.read_text(encoding="utf-8")
- 我用的serp联网搜索白嫖的,所以SerperSearch中也有一定 修改
配置SERPER_SEARCH_URL=https://serpapi.com/search.json
async def search(self, query: str, request_id: str = None, *args, **kwargs) -> List[Doc]:
body = self.construct_body(query, request_id)
async with aiohttp.ClientSession() as session:
async with session.post(self._url, json=body, headers=self.headers, timeout=self._timeout) as response:
result = json.loads(await response.text())
return [
Doc(
doc_type="web_page",
content=item.get("snippet", ""),
title=item.get("title", ""),
link=item.get("link", ""),
data={"search_engine": self._engine},
) for item in result.get("organic", [])
]
genie-backend项目
- es开启的话,要根据es服务端版本改pom版本
- qdrant下载的模型不同,DataAgentInitRunner要修改不同维度
@Override
public void run(String... args) throws Exception {
log.info("dataAgent config:{}", dataAgentConfig);
QdrantConfig qdrantConfig = dataAgentConfig.getQdrantConfig();
if (qdrantConfig.getEnable()) {
// 注意这里如果是512向量维度的模型则要改成512向量模型 qdrantService.createCosineCollection(DataAgentConstants.SCHEMA_COLLECTION_NAME, 1024);
log.info("qdrant collection init success");
}
EsConfig esConfig = dataAgentConfig.getEsConfig();
if (esConfig.getEnable()) {
columnValueSyncService.initColumnValueIndex();
log.info("column value es index init success");
}
chatModelInfoService.initModelInfo(dataAgentConfig);
}
- DataAgentConstants里面可重置es和qdrant存储的对象,注意这里改了genie-tool也得跟着改
//qdrant存储schema的collection名称
public static final String SCHEMA_COLLECTION_NAME = "genie_model_schema";
//es存储列值索引名称
public static final String COLUMN_VALUE_ES_INDEX = "genie_model_column_value";
配置修改
genie-backend
- application.yaml之llm, 这块我是在阿里百炼平台按量付费的,apikey自己申请
llm:
default:
base_url: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
apikey: 'xxxx'
interface_url: '/chat/completions'
model: qwen-plus
max_tokens: 32000
- application.yaml之data_agent部分,如果你想开启TableRag增强问数的能力,可以改下面配置
data-agent:
agent-url: http://127.0.0.1:1601
es-config:
enable: true
host: 127.0.0.1:9200
user: user
password: pwd
qdrantConfig:
enable: true
embeddingUrl: http://127.0.0.1:8282/embed
host: 127.0.0.1
port: 6334
apiKey:
genie-tool
- 项目根目录底下复制.env_template到.env
- 修改.env
# 这部分就是填充百炼申请的apikey
OPENAI_API_KEY=xxxx
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
# 这边用的qwen
DEFAULT_MODEL=qwen-plus
# 这个是做深度搜索用的, 可以取https://serpapi.com/申请每月有免费额度
SERPER_SEARCH_URL=https://serpapi.com/search.json?
SERPER_SEARCH_API_KEY=xxx
# 开启qdrant, 不开启就是关闭qdrant
TR_QDRANT_ENABLE=true
#TR_QDRANT_URL=127.0.0.1:6333
TR_QDRANT_HOST=127.0.0.1
TR_QDRANT_PORT=6334
TR_EMBEDDING_URL=http://127.0.0.1:8282/embed
# TR_QDRANT_COLLECTION_NAME跟你genie-backend保持一致
TR_QDRANT_COLLECTION_NAME=genie_model_schema
# 用的qwen
TR_EXTRACT_SYS_WSD_MODEL_NAME=${DEFAULT_MODEL}
TR_TABLE_FILTER_MODEL_NAME=${DEFAULT_MODEL}
TR_COLUMN_FILTER_MODEL_NAME=${DEFAULT_MODEL}
# es搜索开启,TableRag用的
TR_ES_CONFIGS_HOST=127.0.0.1:9200
# 开启就是关闭es
#TR_ES_CONFIGS_HOST=
TR_ES_CONFIGS_USER=
TR_ES_CONFIGS_PASSWORD=
TR_ES_CONFIGS_SCHEME=http
TR_ES_CONFIGS_INDEX=genie_model_column_value
# 优化配置
TR_QD_THRESHHOLD=0.55
TR_QD_RECALL_TOP_K=20
# 默认模型
# cal engine 配置
CAL_ENGINE_MODEL=${DEFAULT_MODEL}
# Analysis
ANA_SCHEMA_URL=http://localhost:8080/data/queryModelInfo
ANA_DATA_URL=http://localhost:8080/data/apiChatQuery
ANALYSIS_MODEL=${DEFAULT_MODEL}
# NL2SQL
NL2SQL_MODEL_NAME=${DEFAULT_MODEL}
REWRITE_MODEL_NAME=${DEFAULT_MODEL}
THINK_MODEL_NAME=${DEFAULT_MODEL}
整体启动
- 我是用idea打开genie-backend,pycharm打开genie-tool和genie-client以及额外的embedding, WebStorm打开ui
- 启动qdrant---自行搜索
- 启动es---自行搜索
- 额外弄个python脚本启动embeddin(这块只是做测试,简单的512维度), python启动他就好
from fastembed import TextEmbedding
from flask import Flask, request, jsonify
app = Flask(__name__)
# 仅使用模型名;fastembed 会从本地缓存加载
embedding_model = TextEmbedding("BAAI/bge-small-zh-v1.5")
@app.route('/embed', methods=['POST'])
def embed():
data = request.get_json(force=True)
# 注意字段名改成 inputs
texts = data.get('inputs')
if texts is None:
return jsonify({"error": "字段 'inputs' 不能为空"}), 400
embeddings = list(embedding_model.embed(texts))
return jsonify([emb.tolist() for emb in embeddings])
if __name__ == '__main__':
app.run(host="127.0.0.1", port=8282)
-
genie-client启动
genie-client启动.png -
genie-tool启动
genie-tool启动.png -
genie-backend启动
genie-backend启动.png - WebStorm打开ui,按照readme装完启动



