最近几个月来,每当迁移ActiveMQ的时候,就遇到worker进程重启的时候无法连接上ActiveMQ. 今天调试了下,终于发现了原因。
调试环境
- ActiveMQ 5.8,部署两个实例
- stompest 2.1.6
现象
failover uri差不多是这样的:
failover:(tcp://10.153.75.143:61613,tcp://10.15.227.106:61613)?randomize=false
其中,10.153.75.143这个ActiveMQ实例处于未运行状态,10.15.227.106这个ActiveMQ实例处于正常运行状态。
当stompest客户端使用上面的failover配置连接ActiveMQ服务器的时候,出现如下异常:
Connecting to 10.153.75.143:61613 ...
Could not connect to 10.153.75.143:61613 [Could not establish connection [[Errno 61] Connection refused]]
Reconnect failed [Reconnect timeout: 0 attempts]
Traceback (most recent call last):
File "main.py", line 14, in <module>
client = BaseMQClient.factory(config)
File "/Users/liuxiong/src/maslino/test/mq_client.py", line 67, in factory
return cls(mq_config)
File "/Users/liuxiong/src/maslino/test/mq_client.py", line 81, in __init__
self.connect()
File "/Users/liuxiong/src/maslino/test/mq_client.py", line 88, in connect
self.client.connect(heartBeats=(0, 10 * 1000), connectTimeout=60, connectedTimeout=60)
File "/Users/liuxiong/virtualenvs/test/lib/python2.7/site-packages/stompest/sync/client.py", line 85, in connect
for (broker, connectDelay) in self._failover:
File "/Users/liuxiong/virtualenvs/test/lib/python2.7/site-packages/stompest/protocol/failover.py", line 50, in __iter__
yield broker, self._delay()
File "/Users/liuxiong/virtualenvs/test/lib/python2.7/site-packages/stompest/protocol/failover.py", line 85, in _delay
raise StompConnectTimeout('Reconnect timeout: %d attempts' % self._maxReconnectAttempts)
stompest.error.StompConnectTimeout: Reconnect timeout: 0 attempts
从上面的异常可以看出,stompest客户端尝试连接了服务器10.153.75.143,发现连接被拒绝,然后就抛出了个连接超时异常。
我们不是使用了failover配置吗?stompest客户端在发现服务器10.153.75.143连接不上的时候,怎么不去尝试连接服务器10.15.227.106?
解决办法
经过一番搜索,终于发现了原因所在。
ActiveMQ的文档http://activemq.apache.org/failover-transport-reference.html对连接参数startupMaxReconnectAttempts这样解释道:
A value of **-1
** denotes that the number of connection attempts at startup should be unlimited.
A value of **>=0
**denotes the number of reconnect attempts at startup that will be made after which an error is sent back to the client when the client makes a subsequent reconnect attempt.
大致意思是,当startupMaxReconnectAttempts为-1时,不限制初始连接重试次数;当startupMaxReconnectAttempts为0时,不重试;当startupMaxReconnectAttempts大于0时,就重试startupMaxReconnectAttempts次。并且,ActiveMQ文档描述的默认值是-1.
然后,我们看看当failover uri中没有指定startupMaxReconnectAttempts参数的时候,stompest给的默认值是多少。在stompest的代码中,有这么一段:
_SUPPORTED_OPTIONS = {
'initialReconnectDelay': _configurationOption(int, 10)
, 'maxReconnectDelay': _configurationOption(int, 30000)
, 'useExponentialBackOff': _configurationOption(_bool, True)
, 'backOffMultiplier': _configurationOption(float, 2.0)
, 'maxReconnectAttempts': _configurationOption(int, -1)
, 'startupMaxReconnectAttempts': _configurationOption(int, 0)
, 'reconnectDelayJitter': _configurationOption(int, 0)
, 'randomize': _configurationOption(_bool, True)
, 'priorityBackup': _configurationOption(_bool, False)
#, 'backup': _configurationOption(_bool, False), # initialize and hold a second transport connection - to enable fast failover
#, 'timeout': _configurationOption(int, -1), # enables timeout on send operations (in miliseconds) without interruption of reconnection process
#, 'trackMessages': _configurationOption(_bool, False), # keep a cache of in-flight messages that will flushed to a broker on reconnect
#, 'maxCacheSize': _configurationOption(int, 131072), # size in bytes for the cache, if trackMessages is enabled
#, 'updateURIsSupported': _configurationOption(_bool, True), # determines whether the client should accept updates to its list of known URIs from the connected broker
}
发现stompest给参数startupMaxReconnectAttempts设置的默认值是0,也就是说,当stompest刚开始连接ActiveMQ服务器时并不重试。
知道了原因,把failover uri修改成这样再试试:
failover:(tcp://10.153.75.143:61613,tcp://10.15.227.106:61613)?randomize=false,startupMaxReconnectAttempts=-1
连接日志如下:
Connecting to 10.153.75.143:61613 ...
Could not connect to 10.153.75.143:61613 [Could not establish connection [[Errno 61] Connection refused]]
Delaying connect attempt for 10 ms
Connecting to 10.15.227.106:61613 ...
Connection established
Sending CONNECT frame [headers={u'passcode': 'transcode', u'login': 'vtc', u'heart-beat': '0,10000', u'host': '', u'accept-version': '1.0,1.1'}, version=1.0]
Received CONNECTED frame [headers={u'session': u'ID:liuxiong-test-dev001-bjdxt9.qiyi.virtual-49666-1479094899264-2:3', u'heart-beat': u'10000,0', u'version': u'1.1', u'server': u'ActiveMQ/5.10.0'}, version=1.0]
Connected to stomp broker [session=ID:liuxiong-test-dev001-bjdxt9.qiyi.virtual-49666-1479094899264-2:3, version=1.1]
可见,加上参数startupMaxReconnectAttempts=-1之后,stompest客户端在发现第一个ActiveMQ实例连接不上后会尝试连接第二个ActiveMQ实例。