ElasticSearch Java High Level REST Client之Delete By Query API

Delete By Query Request

A DeleteByQueryRequest can be used to delete documents from an index. It requires an existing index (or a set of indices) on which deletion is to be performed.

The simplest form of a DeleteByQueryRequest looks like this and deletes all documents in an index:

DeleteByQueryRequest request =
        new DeleteByQueryRequest("source1", "source2");

Creates the DeleteByQueryRequest on a set of indices.
By default version conflicts abort the DeleteByQueryRequest process but you can just count them with this:

request.setConflicts("proceed");

Set proceed on version conflict
You can limit the documents by adding a query.

request.setQuery(new TermQueryBuilder("user", "kimchy"));

Only copy documents which have field user set to kimchy

It’s also possible to limit the number of processed documents by setting size.

request.setSize(10);

Only copy 10 documents

By default DeleteByQueryRequest uses batches of 1000. You can change the batch size with setBatchSize.

request.setBatchSize(100);

Use batches of 100 documents
DeleteByQueryRequest can also be parallelized using sliced-scroll with setSlices:

request.setSlices(2);

set number of slices to use
DeleteByQueryRequest uses the scroll parameter to control how long it keeps the "search context" alive.

request.setScroll(TimeValue.timeValueMinutes(10));

set scroll time

If you provide routing then the routing is copied to the scroll query, limiting the process to the shards that match that routing value.

request.setRouting("=cat");

set routing

Optional argumentsedit

In addition to the options above the following arguments can optionally be also provided:

request.setTimeout(TimeValue.timeValueMinutes(2));

Timeout to wait for the delete by query request to be performed as a TimeValue

request.setRefresh(true);

Refresh index after calling delete by query

request.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN);

Set indices options

Synchronous executionedit
When executing a DeleteByQueryRequest in the following manner, the client waits for the DeleteByQueryResponse to be returned before continuing with code execution:

BulkByScrollResponse bulkResponse =
        client.deleteByQuery(request, RequestOptions.DEFAULT);

Synchronous calls may throw an IOException in case of either failing to parse the REST response in the high-level REST client, the request times out or similar cases where there is no response coming back from the server.

In cases where the server returns a 4xx or 5xx error code, the high-level client tries to parse the response body error details instead and then throws a generic ElasticsearchException and adds the original ResponseException as a suppressed exception to it.

Asynchronous executionedit
Executing a DeleteByQueryRequest can also be done in an asynchronous fashion so that the client can return directly. Users need to specify how the response or potential failures will be handled by passing the request and a listener to the asynchronous delete-by-query method:

client.deleteByQueryAsync(request, RequestOptions.DEFAULT, listener);

The DeleteByQueryRequest to execute and the ActionListener to use when the execution completes

The asynchronous method does not block and returns immediately. Once it is completed the ActionListener is called back using the onResponse method if the execution successfully completed or using the onFailure method if it failed. Failure scenarios and expected exceptions are the same as in the synchronous execution case.

A typical listener for delete-by-query looks like:

listener = new ActionListener<BulkByScrollResponse>() {
    @Override
    public void onResponse(BulkByScrollResponse bulkResponse) {
        
    }  ①

    @Override
    public void onFailure(Exception e) {
        
    } ②
};

①Called when the execution is successfully completed.

②Called when the whole DeleteByQueryRequest fails.

Delete By Query Responseedit

The returned DeleteByQueryResponse contains information about the executed operations and allows to iterate over each result as follows:

TimeValue timeTaken = bulkResponse.getTook(); 
boolean timedOut = bulkResponse.isTimedOut(); 
long totalDocs = bulkResponse.getTotal(); 
long deletedDocs = bulkResponse.getDeleted(); 
long batches = bulkResponse.getBatches(); 
long noops = bulkResponse.getNoops(); 
long versionConflicts = bulkResponse.getVersionConflicts(); 
long bulkRetries = bulkResponse.getBulkRetries(); 
long searchRetries = bulkResponse.getSearchRetries(); 
TimeValue throttledMillis = bulkResponse.getStatus().getThrottled(); 
TimeValue throttledUntilMillis =
        bulkResponse.getStatus().getThrottledUntil(); 
List<ScrollableHitSource.SearchFailure> searchFailures =
        bulkResponse.getSearchFailures(); 
List<BulkItemResponse.Failure> bulkFailures =
        bulkResponse.getBulkFailures();

从上往下对应的解释依次为：
Get total time taken
Check if the request timed out
Get total number of docs processed
Number of docs that were deleted
Number of batches that were executed
Number of skipped docs
Number of version conflicts
Number of times request had to retry bulk index operations
Number of times request had to retry search operations
The total time this request has throttled itself not including the current throttle time if it is currently sleeping
Remaining delay of any current throttle sleep or 0 if not sleeping
Failures during search phase
Failures during bulk index operation