搭建在线社区自动回复机器人

git地址:https://github.com/liupingw/CASS-Framework

This is an online community auto-reply chatbot framework. It includes text classification module, text generation module, and deployment script. Users can quickly build their own community auto-reply chatbot. They only need to download this repo, configure the environment on their own machine, import their own data sets, and fill in their community API.

The workflow of the framework

  1. Fetch the latest posts through community API
  2. Classify the type of the posts(this type can be used as the basis for judging whether to reply in the next step)
  3. Generate comments
  4. Reply automatically in the community through community API

The link to paper:CASS: Towards Building a Social-Support Chatbot for Online Health Community

Bibtex formatted citation:

@misc{wang2021cass,
      title={CASS: Towards Building a Social-Support Chatbot for Online Health Community},       
      author={Liuping Wang and Dakuo Wang and Feng Tian and Zhenhui Peng and Xiangmin Fan and Zhan Zhang and Shuai Ma and Mo Yu and Xiaojuan Ma and Hongan Wang},      
      year={2021},
      booktitle = {Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing},
      numpages = {31},
      keywords = {{chatbot, bot; pregnancy, healthcare, AI deployment, online community, social support, peer support, emotional support, machine learning, neural network,
      system building, conversational agent, human AI collaboration, human AI interaction, explainable AI, trustworthy AI},
      series = {CSCW ’21}
      }
work flow.png

Reference

OpenNMT:https://github.com/OpenNMT/OpenNMT-py

CNN Classfier:https://github.com/gaussic/text-classification-cnn-rnn

Step1:Setup

Requirements:

  • Python >= 3.5
  • Torch == 1.0.0
  • Torchvision == 0.2.1
  • Torchtext == 0.4.0

Install onmt from OpenNMT/setup.py:

python setup.py install

Step2:Prepare Your Dataset

1.For text classification model

Prepare four files in Classifier/data/cnews/

  • data.train.txt
  • data.val.txt
  • data.test.txt
  • data.pred.txt
2.For text generation model

Prepare following files in OpenNMT/data/

  • src-train.txt
  • src-val.txt
  • src-test.txt
  • tgt-train.txt
  • tgt-val.txt

Step3:Train the Classification Model

1.CNN parameter in Classifier/cnn_model.py
class TCNNConfig(object):

   embedding_dim = 64  
   seq_length = 600  
   num_classes = 2  
   num_filters = 256  
   kernel_size = 5  
   vocab_size = 5000  

   hidden_dim = 128  

   dropout_keep_prob = 0.5  
   learning_rate = 1e-3  

   batch_size = 64  
   num_epochs = 100  

   print_per_batch = 10  
   save_per_batch = 10 
2. Train the model

In Classifier/ directory, run python run_cnn.py train , now it start training

After running the training, the following files are generated in Classifier/data/cnews/:

  • data.vocab.txt
3. Test the model

In Classifier/ directory, run python run_cnn.py test to test on data.test.txt

4. Predict

Classifier/predict.py provide predict function of CNN model. Run predict.py to predict sentence on Classifer/data/cnews/data.predict.txt. This will output predictions into Classifier/predict.txt.

Step4:Train the Generation Model

1. Preprocess the data

run OpenNMT/preprocess.py

python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

Validation files are required and used to evaluate the convergence of the training. It usually contains no more than 5000 sentences.

After running the preprocessing, the following files are generated in OpenNMT/data/:

  • demo.train.pt: serialized PyTorch file containing training data
  • demo.valid.pt: serialized PyTorch file containing validation data
  • demo.vocab.pt: serialized PyTorch file containing vocabulary data

Internally the system never touches the words themselves but uses these indices.

2. Train the model

run OpenNMT/train.py

python train.py -data data/demo -save_model demo-model

The main train command is quite simple. Minimally it takes a data file and a save file. This will run the default model, which consists of a 2-layer LSTM with 500 hidden units on both the encoder/decoder. If you want to train on GPU, you need to set, as an example: CUDA_VISIBLE_DEVICES=1,3 -world_size 2 -gpu_ranks 0 1 to use (say) GPU 1 and 3 on this node only. To know more about distributed training on single or multi nodes, read the FAQ section:xxxxxxx

3. Translate

run OpenNMT/translate_original.py

python translate_original.py -model demo-model_acc_XX.XX_ppl_XXX.XX_eX.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose

Now you have a model that you can use to predict on new data. We do this by running beam search. This will output predictions into pred.txt.

Step5: Run Deployment Script

1.Set API and parameter

In OpenNMT/Deployment.py file, you can fill in your own Url, parameter, and simulative user information:

#################################################################################################
##############You should fill in your community API and simulative user information##############
#########################and modify time parameter if you want###################################

THRESHOLD = 10  # the threshold for deciding whether the chatbot needs to respond to the overlooked post or not 
STUDY_TIME = 60 * 24 * 7  # the whole deployment period 
OBSERVE_INTERVAL = 9  # the interval time between getting latest posts 
COMMENT_INTERVAL = 2  # the interval time detecting if observed posts have been replied 

Community_getLatestPost_Url = ""
Community_toComment_Url = ""
Community_getPostDetail_Url = ""


AI_auth_list = [["username1", "<authorization1>"],
                ["username2", "<authorization2>"],
                ["username3", "<authorization3>"]]
# e.g.
# username = "saltone"
# authorization = "XDS 7.fIC1Fkcg6-Qa6--o9qUP-FyrhLkyLLZOMN6r7Jxxx"

#################################################################################################
##############You should fill in your community API and simulative user information##############
#########################and modify time parameter if you want###################################
2. Run deployment script

run OpenNMT/Deployment.py

In console , you will see following log if you did not do anything :

content: This is a post example
comment: This is a comment example
Do you agree to Comment? Input nothing to confirm or input an appropriate sentence:
 Agree to comment
chatbot will comment on this sentence: This is a comment example

If you input a new sentence, the comment will be refined:

content: This is a post example
comment: This is a comment example
Do you agree to Comment? Input nothing to confirm or input an appropriate sentence: Fighting!!!
chatbot will comment on this sentence: Fighting!!!

Note

1.Different online communities have different APIs and require different parameters. This part needs to be modified according to the specific situation.

2.OpenNMT has been updated to version 1.7, which is not compatible with the version(1.0.0) used in this repo.

3.If you have any questions, please contact me by email:wangliuping17@mails.ucas.ac.cn

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,657评论 6 505
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,889评论 3 394
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,057评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,509评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,562评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,443评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,251评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,129评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,561评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,779评论 3 335
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,902评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,621评论 5 345
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,220评论 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,838评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,971评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,025评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,843评论 2 354

推荐阅读更多精彩内容

  • 久违的晴天,家长会。 家长大会开好到教室时,离放学已经没多少时间了。班主任说已经安排了三个家长分享经验。 放学铃声...
    飘雪儿5阅读 7,523评论 16 22
  • 今天感恩节哎,感谢一直在我身边的亲朋好友。感恩相遇!感恩不离不弃。 中午开了第一次的党会,身份的转变要...
    迷月闪星情阅读 10,564评论 0 11
  • 可爱进取,孤独成精。努力飞翔,天堂翱翔。战争美好,孤独进取。胆大飞翔,成就辉煌。努力进取,遥望,和谐家园。可爱游走...
    赵原野阅读 2,727评论 1 1
  • 在妖界我有个名头叫胡百晓,无论是何事,只要找到胡百晓即可有解决的办法。因为是只狐狸大家以讹传讹叫我“倾城百晓”,...
    猫九0110阅读 3,261评论 7 3