- 大致有三种解决方法
- scan and scroll all documents
- _update_by_query api
- re-index and add new fields, use the /_reindex API
- 首先要设置允许script运行,修改elasticsearch.yml
script.engine.groovy.inline.aggs: on
script.engine.groovy.inline.search: on
script.engine.groovy.inline.update: on
或者
script.inline: true
- scan anc scroll all documents
- use /_search?scroll to fetch the docs
- perform your operation
- send /_bulk update requests
POST bt/bt/_update
{
"script" : "ctx._source.new_field = \"value_of_new_field\""
}
- _update_by_query API
POST bt/bt/_update_by_query
{
"script": {
"inline": "if (ctx._source.bt0 == null || ctx._source.bt1==null) { ctx._source.btNew1=null } else { ctx._source.btNew1 = ctx._source.bt0 + ctx._source.bt1 }"
}
}
{
"source": {
"index": "bt",
"type": "bt"
},
"dest": {
"index": "new_bt1"
},
"script": {
"inline": "if (ctx._source.bt0 == null || ctx._source.bt1==null) { ctx._source.btPlus=null } else { ctx._source.btPlus = ctx._source.bt0 + ctx._source.bt1 }"
}
}
- Difference between _update_by_query and _reindex
- Just like _update_by_query, _reindex gets a snapshot of the source index but its target must be a different index so version conflicts are unlikely.
- Unlike _update_by_query, the script is allowed to modify the document’s metadata.
其实_update_and_query和_reindex的实现差不多,性能没有测过,不过应该差不多。所以elasticsearch还有task api可以用来检测这两个任务运行情况。