PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"html_text_analyzer": {
"tokenizer": "standard",
"char_filter": ["html_char_filter"]
}
},
"char_filter": {
"html_char_filter": {
"type": "html_strip"
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "html_text_analyzer"
}
}
}
}
创建索引时,可以通过指定分析器html_text_analyzer来去除 HTML 标记,匹配富文本字段时过滤掉富文本中的标签