一、概要
eureka-server需要同步结点间的信息。eureka没有选择使用zookeeper,而是自己实现了一套复制算法。主要有几点考虑:
- zookeeper集群故障的时候能够继续提供服务
- zookeeper实现的是cp,而服务发现认为一段时间内的数据不一致的可以容忍的,也就是我们真正需要的是ap。
下面我们来重点分析一下节点之间的数据同步
二、分析
1. 环境部署
我们分别起了两个eureka-server,端口分别为8080和8082。
2.流量监控
我们可以发现每隔30s会触发一次节点间的数据复制
报文内容如下:
Hypertext Transfer Protocol
POST /eureka/v2/peerreplication/batch/ HTTP/1.1\r\n
Accept: application/json\r\n
Content-Type: application/json\r\n
DiscoveryIdentity-Name: DefaultServer\r\n
DiscoveryIdentity-Version: 1.0\r\n
DiscoveryIdentity-Id: 172.19.10.230\r\n
Accept-Encoding: gzip\r\n
Transfer-Encoding: chunked\r\n
Host: localhost:8082\r\n
Connection: Keep-Alive\r\n
User-Agent: Java-EurekaClient-Replication/v1.4.13-SNAPSHOT\r\n
\r\n
[Full request URI: http://localhost:8082/eureka/v2/peerreplication/batch/]
[HTTP request 1/1]
[Response in frame: 257]
HTTP chunked response
Data chunk (247 octets)
Chunk size: 247 octets
Data (247 bytes)
Data: 7b227265706c69636174696f6e4c697374223a5b7b226170…
[Length: 247]
Chunk boundary: 0d0a
End of chunked encoding
Chunk size: 0 octets
\r\n
File Data: 247 bytes
3.日志分析
我通过增加了一些日志,方便观察整个通讯的过程。
2019-04-15 23:41:48,607 INFO com.netflix.eureka.resources.InstanceResource:113 [http-bio-8082-exec-2] [renewLease] eureka renewLease...isFromReplicaNode:true,name:EUREKA-1,id:201709-07262
2019-04-15 23:41:48,607 INFO com.netflix.eureka.registry.AbstractInstanceRegistry:346 [http-bio-8082-exec-2] [renew] renew...appName:EUREKA-1,id:201709-07262,isReplicattion:true
2019-04-15 23:41:48,607 DEBUG com.netflix.eureka.registry.AbstractInstanceRegistry:1321 [http-bio-8082-exec-2] [getOverriddenInstanceStatus] Processing override status using rule: [com.netflix.eureka.registry.rule.DownOrStartingRule, com.netflix.eureka.registry.rule.OverrideExistsRule, com.netflix.eureka.registry.rule.LeaseExistsRule, com.netflix.eureka.registry.rule.AlwaysMatchInstanceStatusRule]
2019-04-15 23:41:48,607 DEBUG com.netflix.eureka.registry.rule.AlwaysMatchInstanceStatusRule:20 [http-bio-8082-exec-2] [apply] Returning the default instance status UP for instance 201709-07262
2019-04-15 23:41:48,608 DEBUG com.netflix.eureka.resources.InstanceResource:136 [http-bio-8082-exec-2] [renewLease] Found (Renew): EUREKA-1 - 201709-07262; reply status=200
接口/eureka/v2/peerreplication/batch/的入口方法在PeerReplicationResource.batchReplication这个方法。
/**
* Process batched replication events from peer eureka nodes.
*
* <p>
* The batched events are delegated to underlying resources to generate a
* {@link ReplicationListResponse} containing the individual responses to the batched events
* </p>
*
* @param replicationList
* The List of replication events from peer eureka nodes
* @return A batched response containing the information about the responses of individual events
*/
@Path("batch")
@POST
public Response batchReplication(ReplicationList replicationList) {
核心的代码逻辑在AbstractInstanceRegistry.renew。
核心逻辑就是更新registry的值,eureka把节点的信息保存在内存。当然这里会有状态冲突的解决策略(新旧实例的状态不一致),具体实现可以看InstanceStatusOverrideRule的实现。
/**
* Marks the given instance of the given app name as renewed, and also marks whether it originated from
* replication.
*
* @see com.netflix.eureka.lease.LeaseManager#renew(java.lang.String, java.lang.String, boolean)
*/
public boolean renew(String appName, String id, boolean isReplication) {
logger.info("renew...appName:{},id:{},isReplicattion:{}",appName,id,isReplication);
RENEW.increment(isReplication);
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToRenew = null;
if (gMap != null) {
leaseToRenew = gMap.get(id);
}
if (leaseToRenew == null) {
RENEW_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
return false;
} else {
InstanceInfo instanceInfo = leaseToRenew.getHolder();
if (instanceInfo != null) {
// touchASGCache(instanceInfo.getASGName());
InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
instanceInfo, leaseToRenew, isReplication);
if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
+ "; re-register required", instanceInfo.getId());
RENEW_NOT_FOUND.increment(isReplication);
return false;
}
if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
Object[] args = {
instanceInfo.getStatus().name(),
instanceInfo.getOverriddenStatus().name(),
instanceInfo.getId()
};
logger.info(
"The instance status {} is different from overridden instance status {} for instance {}. "
+ "Hence setting the status to overridden status", args);
instanceInfo.setStatus(overriddenInstanceStatus);
}
}
renewsLastMin.increment();
leaseToRenew.renew();
return true;
}
}
三、结论
eureka节点间的数据复制是通过定时器把自身的数据全量同步到其他结点实现的,另外eureka并不会把这些信息持久化,而是直接保存到内存。这里会存在几个问题。
- 流量。每个结点都相互复制全量的信息,这也就意味着流量会随着结点的数量和数据的上涨呈线性增长。
- 自我保护机制。当两个结点的数据不一致的情况下,以谁的数据为准?如何保证不被错误的结点信息覆盖
接下来会继续重点分析下源码,看看eureka是如何解决这几个问题的