背景
老板说要做一个FAQ系统,基于AI自然语言搜索自动回复、增加预置问题功能。目标替换掉APP正在使用的Zendesk SDK(估值10亿美金的公司),一切都是自研最香。最后我收到了客服团队导出的两个FAQ.pdf文件。领导说:就这两个文件开始你的表演。我:What????
开始表演-整体设计
作为一个资深iOS开发,如何设计一个FAQ系统?捋清楚整体框架:
- 只有两个FAQ.pdf文件,相关邮件中找到了pdf的链接来源,将其导出为word文档,和markDown格式
- word文档汇总包含了几百个FAQ问题,几千张图片,需要将question、answer、子标题、副标题、图片等资源到导出为json
- 几千张图片需要根据问题+回答的顺序依次插入到Json中
- answer 导出时需用html格式来支持富文本
- 几千张图片资源需从word中按顺序导出并命名,上传到云OSS中
- 将上传后的OSS链接批量更新的.md文件中
- md文件检查排版,#,##,###,自定义Keyword,eventType,图片替换key
- 编写Python脚本:
- 将word image导出按顺序命名
- 脚本批量替换md中图片站位name
- NLTK自然语言模型
- 基于Flask WSGIServer部署后台
- 编写html访问python后台用来调试NLTK模型
1. faq文档导出word、和md
faq导出为word何md格式,word用来到处图片资源,按顺序命名。md格式用来脚本将md转Json
1.1 word2Png.py
难点在于按顺序导出word中的图片并命名
import os
from docx import Document
from PIL import Image
import io
import sys
def extract_images_from_docx(docx_file):
# 创建输出文件夹
output_folder = os.path.splitext(docx_file)[0]
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# 打开 Docx 文档
doc = Document(docx_file)
# 获取文档中的所有图片对象
image_objects = doc.element.findall('.//{http://schemas.openxmlformats.org/drawingml/2006/main}blip')
# 遍历图片对象
image_index = 1
for i, img_obj in enumerate(image_objects):
try:
# 获取图片 ID
image_id = img_obj.get('{http://schemas.openxmlformats.org/officeDocument/2006/relationships}embed')
# 获取图片数据
image_data = doc.part.related_parts[image_id].blob
# 创建图片
image = Image.open(io.BytesIO(image_data))
# 保存图片
image_filename = f"image{image_index}.png"
image_path = os.path.join(output_folder, image_filename)
image.save(image_path)
print(f"Saved image {image_filename} at position {i+1}")
image_index += 1
except (KeyError, IndexError, OSError) as e:
print(f"Unable to extract image at position {i+1}: {e}")
continue
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Please provide the DOCX file as a command line argument.")
else:
docx_file = sys.argv[1]
extract_images_from_docx(docx_file)
1.2 将导出的图片name批量替换到md中
难点是用正则表达式,将md中@img[image*]替换掉
import re
# 读取 Markdown 文件的内容
with open('iot.md', 'r') as file:
file_content = file.read()
# 使用正则表达式替换 @img[image*]
file_content = re.sub(r'@img\[image(\d+)\]', r'@img[https://xxx.oss-xxx.aliyuncs.com/xx/xx/image\1.png]', file_content)
# 将更新后的内容写回到文件
with open('Brand_YI_ioT.md', 'w') as file:
file.write(file_content)
print('替换完成!')
1.3 Md2Json.py
- 定义Json:
{
"answer":"",
"id":102,
"isLast":0,
"keywords":"",
"parentId":0,
"platform":3,
"question":"Subscription",
"type":0
}
- 根据md中格式进行分组
--#
一级问题列表
--##
二级问题列表
--###
三级问题+回答
--[^eventType]:
当前问题点击Type
,0->FAQ, 1->特定跳转,2->...
--[^keywords]:
问题关键字,用来搜索关键字优先查询
--@img[]
回答中对应图片链接
--isLast
是否是最后一级问题List
import json
import markdown
import sys
import html
def md_to_json(md_file):
with open(md_file, 'r', encoding='utf-8') as file:
lines = file.readlines()
json_array = []
current_section = None
platform = 2
if "kami" in md_file.lower():
section_id_l1 = 1000 # Start id for first-level headings
section_id_l2 = 2000
section_id_l3 = 3000
platform = 2
elif "iot" in md_file.lower():
section_id_l1 = 100 # Start id for first-level headings
section_id_l2 = 200
section_id_l3 = 300
platform = 3
else:
section_id_l1 = 400 # Start id for first-level headings
section_id_l2 = 500
section_id_l3 = 600
for line_number, line in enumerate(lines):
line = line.strip()
tmp_section = {"id": "", "parentId": 0, "question": "", "answer": "", "keywords":"","isLast":0,"platform":platform,"type":0}
if line.startswith('#') and not line.startswith('##') and not line.startswith('###') and not line.startswith('######') and not line.startswith('#Type'):
section_id_l1 += 1
if current_section:
json_array.append(current_section)
tmp_section["id"] = section_id_l1
tmp_section["question"] = line.lstrip('#').strip()
current_section = tmp_section
elif line.startswith('[^eventType]:'):
current_section["type"] = line.lstrip('[^eventType]:').strip()
elif line.startswith('##') and not line.startswith('###') and not line.startswith('######'):
section_id_l2 += 1
if current_section:
json_array.append(current_section)
tmp_section["id"] = section_id_l2
tmp_section["parentId"] = section_id_l1
tmp_section["question"] = line.lstrip('##').strip()
current_section = tmp_section
elif line.startswith('###') and not line.startswith('######'):
section_id_l3 += 1
if current_section:
json_array.append(current_section)
parent_id = section_id_l2 # Parent id for third-level headings
tmp_section["id"] = section_id_l3
tmp_section["parentId"] = section_id_l2
tmp_section["question"] = line.lstrip('###').strip()
tmp_section["isLast"] = 1
current_section = tmp_section
elif line.startswith('[^keywords]:'):
current_section["keywords"] = line.lstrip('[^keywords]:').strip()
elif current_section:
if "@img[" in line:
img_tag_content = line.split('@img[')[1].split(']')[0]
current_section["answer"] += f"<img src='{img_tag_content}' />\n"
else:
# Use html.unescape to ensure & is not converted to &
current_section["answer"] += html.unescape(markdown.markdown(line, output_format='html5')) + "\n"
if current_section["answer"] == "\n" or current_section["answer"] == "":
current_section["isLast"] = 0
current_section["answer"] = ""
else:
current_section["isLast"] = 1
if current_section:
json_array.append(current_section)
return json_array
if len(sys.argv) < 2:
print("Please provide the Markdown file as a command line argument.")
sys.exit(1)
md_file = sys.argv[1]
json_data = md_to_json(md_file)
output_file = md_file.replace('.md', '.json')
with open(output_file, 'w', encoding='utf-8') as json_file:
json.dump(json_data, json_file, ensure_ascii=False, indent=4)
导出后的Json:
[ {
"id": 3001,
"parentId": 2001,
"question": "Pairing Your Fall Detect Camera",
"answer": "<p><strong>1. Check a few things to avoid issues with pairing</strong></p>\n\n\n\n<p>1. Please check your Wi-Fi settings before pairing up and make sure that you are connected to a 2.4GHz.</p>\n\n<p>2. Make sure that you've entered your correct Wi-Fi password.</p>\n\n<p>3. While connecting/adding your camera sensor, it's suggested to place the camera sensor near your Wi-Fi router.</p>\n\n<p>Once you move your camera sensor near the Wi-Fi router, there should be no obstructions or any objects that may block the signal from the router to your camera sensor.</p>\n\n<p>4. Ensure the Location or GPS on your phone is enabled, and Allow the permissions for your App.</p>\n\n\n\n\n\n<p><strong>2. Steps to pair your camera</strong></p>\n\n\n\n<p>Here are the steps to pair your camera sensor to the app:</p>\n\n\n\n<p>1. Connect the camera sensor with the USB cable and adapter, and plug the adapter into a power outlet. The camera sensor can also be powered by only the USB cable if it's plugged into a device that provides power through USB.</p>\n\n\n\n<p>2. Open the App and log in to your account.</p>\n\n\n\n<p>3. To add your camera sensor, select the '+' icon.</p>\n\n\n\n<img src='https://publicfiles-us.oss-us-west-1.aliyuncs.com/MD/YIHome/image1.png' />\n\n\n\n\n<p>4. Under Select Device(s), choose the camera sensor that is going to be paired with the app.</p>\n\n\n\n<img src='https://publicfiles-us.oss-us-west-1.aliyuncs.com/MD/YIHome/image2.png' />\n\n\n\n<p>5. Then select, [I heard “Waiting to connect”] at the bottom of the screen. If you don’t hear waiting to connect click on the link called, \"if you did not hear anything, tap here\".</p>\n\n\n\n<img src='https://publicfiles-us.oss-us-west-1.aliyuncs.com/MD/YIHome/image3.png' />\n\n<p>6. Under Connect to Wi-Fi enter the password for the Wi-Fi router. Tap Connect to Wi-Fi.</p>\n\n\n\n<img src='https://publicfiles-us.oss-us-west-1.aliyuncs.com/MD/YIHome/image4.png' />\n\n\n\n<p>7. Then face the QR code directly at the camera sensor. If you hear “QR code scan is successful,” press next. If you did not hear anything, click on the link at the bottom of the screen.</p>\n\n\n\n<img src='https://publicfiles-us.oss-us-west-1.aliyuncs.com/MD/YIHome/image5.png' />\n\n\n\n<p></p>\n\n<p>8. The camera sensor will go into retrieving pairing status which can take 1-2 minutes. You'll hear \"You can start using your camera sensor now\"</p>\n\n\n\n<img src='https://publicfiles-us.oss-us-west-1.aliyuncs.com/MD/YIHome/image6.png' />\n\n\n\n<p><strong>3. Fall Detect Camera Setup Verification</strong></p>\n\n\n\n<p>In order for the Fall Detect service to work correctly, the installation will need to be verified by our Kami Vision experts for proper fall detection accuracy. Please visit the link below for more details:</p>\n\n<p><a href=\"https://help.yitechnology.com/hc/en-us/articles/10214789670427-Camera-Sensor-Setup-Verification\">Camera Sensor Setup Verification</a></p>\n\n\n\n\n\n<p><strong>More information</strong></p>\n\n\n\n<p>For more info or updates, kindly subscribe to:</p>\n\n\n\n<p><a href=\"https://www.youtube.com/channel/UCD2j0wUuFp9ixlul1qocphQ\">Kami YouTube</a></p>\n\n<p><a href=\"https://www.youtube.com/channel/UCIrTFKMVjT82cz85Nekg-OQ\">YI YouTube</a></p>\n\n\n\n",
"keywords": "",
"isLast": 1,
"platform": 2,
"type": 0
},
{
"id": 2002,
"parentId": 1001,
"question": "Security and privacy of my camera videos and data in my YI/Kami Home app?",
"answer": "",
"keywords": "",
"isLast": 0,
"platform": 2,
"type": 0
},...]