Abstract
We investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach.
We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task.
两次DNN方法,一次生成词嵌入,第二次用作实体识别。
Introduction
介绍electronic health record的应用价值,以及面临实体识别的问题。
Many existing clinical NLP systems use dictionariesand rule-based methods to identify clinical concepts, such as MedLEE, MetaMap, cTAKES.
More recently, a number of challenges on NER involving shared tasks in clinical text have been organized, including the 2009 i2b2, the 2010 i2b2, the 2013 Share/CLEF challenge and the 2014 Semantic Evaluation challenge.(有空着重了解下=_=)
Conventional ML-based methods have been applied to Chinese clinical NER tasks.
In summary, current efforts on NER in Chinese clinical text primarily focus on investigating different machine learning algorithms or optimizing combinations of different types of features via human engineering.
最近越来越多人对基于深度学习的NLP系统感兴趣。这种系统能从大规模的未标注的语料通过非监督的方法学习到有用的特征表达式。深度学习是一个能通过深度神经网络学习高级特征表达的机器学习的研究领域。现在在图像处理,语音自动识别和机器翻译方面获得了先进的表现。NLP研究者开发出DNNs从大量的未标注的数据中去学习有用的特征,不再用花费大量时间去寻找任务特性的特征。Dr. Ronan Collobert的系统通过单个深度神经网络在很多NLP任务中获得了最先进的表现。
本文首个应用DNNs研究中文病历NER,并对比了传统的CRF方法。