4-fescar源码分析-事务的提交&回滚
一、官方介绍
1.TM 向 TC 发起针对 XID 的全局提交或回滚决议。
2.TC 调度 XID 下管辖的全部分支事务完成提交或回滚请求。
上一篇分析了事务分支的注册逻辑。
那这一篇主要分析fescar如何提交全局事务?TM提交事务commit请求后,TC如何处理?RM如何管控分支事务的commit?
--
二、(原理)源码分析
紧接着上一篇的RM向TC注册了事务分支,依然借助官网的example例图进行出发。
2.1 demo
-
继续看下官网的结构图:
项目中存在官方的example模块,里面就模拟了上图的相关流程:先回到本节主题:**事务的提交**
--
三、commit
3.1.TM触发commit
-
触发逻辑
#TransactionalTemplate public Object execute(TransactionalExecutor business) throws TransactionalExecutor.ExecutionException { ... tx.commit(); ... return rs; } #DefaultGlobalTransaction @Override public void commit() throws TransactionException { check(); RootContext.unbind(); if (role == GlobalTransactionRole.Participant) { // Participant has no responsibility of committing return; } status = transactionManager.commit(xid); } #DefaultTransactionManager @Override public GlobalStatus commit(String xid) throws TransactionException { long txId = XID.getTransactionId(xid); GlobalCommitRequest globalCommit = new GlobalCommitRequest(); globalCommit.setTransactionId(txId); GlobalCommitResponse response = (GlobalCommitResponse) syncCall(globalCommit); return response.getGlobalStatus(); }
TM 端很简单,
- 1.从RootContext中移除XID
- 2.构造GlobalCommitRequest参数
- 3.向TC端发送commit消息
主要逻辑继续跟踪TC对于commit类型消息的处理
--
3.2.TC处理commit
-
继续忽略接收消息的细节,跟begin消息触发逻辑一致。直接进入核心处理:
#DefaultCoordinator @Override protected void doGlobalCommit(GlobalCommitRequest request, GlobalCommitResponse response, RpcContext rpcContext) throws TransactionException { response.setGlobalStatus(core.commit(XID.generateXID(request.getTransactionId()))); } #DefaultCore @Override public GlobalStatus commit(String xid) throws TransactionException { GlobalSession globalSession = SessionHolder.findGlobalSession(XID.getTransactionId(xid)); if (globalSession == null) { return GlobalStatus.Finished; } GlobalStatus status = globalSession.getStatus(); globalSession.closeAndClean(); // Highlight: Firstly, close the session, then no more branch can be registered. if (status == GlobalStatus.Begin) { if (globalSession.canBeCommittedAsync()) { asyncCommit(globalSession); } else { doGlobalCommit(globalSession, false); } } return globalSession.getStatus(); } #DefaultCore private void asyncCommit(GlobalSession globalSession) throws TransactionException { globalSession.addSessionLifecycleListener(SessionHolder.getAsyncCommittingSessionManager()); SessionHolder.getAsyncCommittingSessionManager().addGlobalSession(globalSession); }
TC端逻辑就是如此:
- 1.关闭session相关资源
- 2.释放branchSeeion的相关的锁
- 3.将ASYNC_COMMITTING_SESSION_MANAGER的DefaultSessionManager加入GlobalSession生命周期
- 4.将当前的GlobalSession加入到将ASYNC_COMMITTING_SESSION_MANAGER的DefaultSessionManager中。
看了上面的逻辑,好像跟事务的提交没啥关系,那么不要慌,还记的最开始介绍的初始化逻辑么:
public void init() { retryRollbacking.scheduleAtFixedRate(new Runnable() { ... asyncCommitting.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { handleAsyncCommitting(); } catch (Exception e) { LOGGER.info("Exception async committing ... ", e); } } }, 0, 10, TimeUnit.MILLISECONDS); ... }
这里早就在启动的时候跑了线程去异步轮训commit任务了,我们究竟看看做了什么?
#DefaultCoordinator private void handleAsyncCommitting() { Collection<GlobalSession> asyncCommittingSessions = SessionHolder.getAsyncCommittingSessionManager().allSessions(); for (GlobalSession asyncCommittingSession : asyncCommittingSessions) { try { core.doGlobalCommit(asyncCommittingSession, true); } catch (TransactionException ex) { LOGGER.info("Failed to async committing [{}] {} {}", asyncCommittingSession.getTransactionId(), ex.getCode(), ex.getMessage()); } } } #DefaultCore @Override public void doGlobalCommit(GlobalSession globalSession, boolean retrying) throws TransactionException { for (BranchSession branchSession : globalSession.getSortedBranches()) { BranchStatus currentStatus = branchSession.getStatus(); if (currentStatus == BranchStatus.PhaseOne_Failed) { continue; } try { BranchStatus branchStatus = resourceManagerInbound.branchCommit(XID.generateXID(branchSession.getTransactionId()), branchSession.getBranchId(), branchSession.getResourceId(), branchSession.getApplicationData()); switch (branchStatus) { case PhaseTwo_Committed: globalSession.removeBranch(branchSession); continue; case PhaseTwo_CommitFailed_Unretryable: if (globalSession.canBeCommittedAsync()) { LOGGER.error("By [{}], failed to commit branch {}", branchStatus, branchSession); continue; } else { globalSession.changeStatus(GlobalStatus.CommitFailed); globalSession.end(); LOGGER.error("Finally, failed to commit global[{}] since branch[{}] commit failed", globalSession.getTransactionId(), branchSession.getBranchId()); return; } default: if (!retrying) { queueToRetryCommit(globalSession); return; } if (globalSession.canBeCommittedAsync()) { LOGGER.error("By [{}], failed to commit branch {}", branchStatus, branchSession); continue; } else { LOGGER.error("Failed to commit global[{}] since branch[{}] commit failed, will retry later.", globalSession.getTransactionId(), branchSession.getBranchId()); return; } } } catch (Exception ex) { ... } if (globalSession.hasBranch()) { LOGGER.info("Global[{}] committing is NOT done.", globalSession.getTransactionId()); return; } globalSession.changeStatus(GlobalStatus.Committed); globalSession.end(); LOGGER.info("Global[{}] committing is successfully done.", globalSession.getTransactionId()); }
这里就是轮训SessionHolder.getAsyncCommittingSessionManager().allSessions()出来的GlobalSession,也就是前面加入的逻辑了。
轮训GlobalSession出来后,再次轮训其中的branchSession,最后调用resourceManagerInbound.branchCommit再次进行分支事务的commit。逻辑好像清楚了。
最终移除分支事务且根据分支事务返回的状态进行全局事务状态的更新继续跟踪代码:
@Override public BranchStatus branchCommit(String xid, long branchId, String resourceId, String applicationData) throws TransactionException { try { BranchCommitRequest request = new BranchCommitRequest(); request.setXid(xid); request.setBranchId(branchId); request.setResourceId(resourceId); request.setApplicationData(applicationData); GlobalSession globalSession = SessionHolder.findGlobalSession(XID.getTransactionId(xid)); BranchSession branchSession = globalSession.getBranch(branchId); BranchCommitResponse response = (BranchCommitResponse)messageSender.sendSyncRequest(resourceId, branchSession.getClientId(), request); return response.getBranchStatus(); } catch (IOException e) { throw new TransactionException(FailedToSendBranchCommitRequest, branchId + "/" + xid, e); } catch (TimeoutException e) { throw new TransactionException(FailedToSendBranchCommitRequest, branchId + "/" + xid, e); } }
这里构造了BranchCommitRequest参数,并进行了消息的发送,那么这里是TC端,此处可想而知,分支事务的派发就是通知RM进行相关的处理了。
RM具体执行完成后,TC更新分支事务的状态,完成TC端逻辑的最终处理。那么具体的分支事务的执行,就紧接着回到了RM的逻辑处理了,继续转到RM处理commit逻辑。
--
3.3.RM处理TC传送的commit
-
RM接受TC端消息依然是AbstractRpcRemoting中channelRead,之后进行消息的分发,这里我们在启动的rpc时,初始化的是AbstractRpcRemotingClient,因此进入以下逻辑:
@Override public void dispatch(long msgId, ChannelHandlerContext ctx, Object msg) { if (clientMessageListener != null) { String remoteAddress = NetUtil.toStringAddress(ctx.channel().remoteAddress()); clientMessageListener.onMessage(msgId, remoteAddress, msg, this); } }
继续看消息监听逻辑:
@Override public void onMessage(long msgId, String serverAddress, Object msg, ClientMessageSender sender) { if (LOGGER.isInfoEnabled()) { LOGGER.info("onMessage:" + msg); } if (msg instanceof BranchCommitRequest) { handleBranchCommit(msgId, serverAddress, (BranchCommitRequest)msg, sender); } else if (msg instanceof BranchRollbackRequest) { handleBranchRollback(msgId, serverAddress, (BranchRollbackRequest)msg, sender); } }
最终找到处理器:RMHandlerAT
@Override protected void doBranchCommit(BranchCommitRequest request, BranchCommitResponse response) throws TransactionException { String xid = request.getXid(); long branchId = request.getBranchId(); String resourceId = request.getResourceId(); String applicationData = request.getApplicationData(); LOGGER.info("AT Branch committing: " + xid + " " + branchId + " " + resourceId + " " + applicationData); BranchStatus status = dataSourceManager.branchCommit(xid, branchId, resourceId, applicationData); response.setBranchStatus(status); LOGGER.info("AT Branch commit result: " + status); } @Override public BranchStatus branchCommit(String xid, long branchId, String resourceId, String applicationData) throws TransactionException { return asyncWorker.branchCommit(xid, branchId, resourceId, applicationData); }
看看AsyncWorker做了什么?
/** * 用于在分支事务提交后异步删除undo sql记录 */ @Override public BranchStatus branchCommit(String xid, long branchId, String resourceId, String applicationData) throws TransactionException { if (ASYNC_COMMIT_BUFFER.size() < ASYNC_COMMIT_BUFFER_LIMIT) { ASYNC_COMMIT_BUFFER.add(new Phase2Context(xid, branchId, resourceId, applicationData)); } else { LOGGER.warn("Async commit buffer is FULL. Rejected branch [" + branchId + "/" + xid + "] will be handled by housekeeping later."); } return BranchStatus.PhaseTwo_Committed; }
根据提交的分支事务构造Phase2Context,并将其加入ASYNC_COMMIT_BUFFER中。
还记得rpc服务启动时,会初始化AsyncWorker的轮训逻辑吗?public synchronized void init() { LOGGER.info("Async Commit Buffer Limit: " + ASYNC_COMMIT_BUFFER_LIMIT); timerExecutor = new ScheduledThreadPoolExecutor(1, new NamedThreadFactory("AsyncWorker", 1, true)); timerExecutor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { doBranchCommits(); } catch (Throwable e) { LOGGER.info("Failed at async committing ... " + e.getMessage()); } } }, 10, 1000 * 1, TimeUnit.MILLISECONDS); } private void doBranchCommits() { if (ASYNC_COMMIT_BUFFER.size() == 0) { return; } Map<String, List<Phase2Context>> mappedContexts = new HashMap<>(); Iterator<Phase2Context> iterator = ASYNC_COMMIT_BUFFER.iterator(); while (iterator.hasNext()) { ... List<Phase2Context> contextsGroupedByResourceId = mappedContexts.get(resourceId); for (Phase2Context commitContext : contextsGroupedByResourceId) { try { UndoLogManager.deleteUndoLog(commitContext.xid, commitContext.branchId, conn); ... } } finally { ... } } }
此时就是删除回滚脚本的具体逻辑了。至此,commit逻辑结束。即commit无非就是释放了session,且删除了回滚的脚本undo日志,结果的执行在execute就已经完结。当然,此处还有个锁的释放。锁在后面一节单独分析。
--
四、rollback
4.1.TM触发rollback
-
触发逻辑
#TransactionalTemplate public Object execute(TransactionalExecutor business) throws TransactionalExecutor.ExecutionException { ... tx.rollback(); ... return rs; } #DefaultGlobalTransaction @Override public void rollback() throws TransactionException { check(); RootContext.unbind(); if (role == GlobalTransactionRole.Participant) { // Participant has no responsibility of committing return; } /** * 通过将事务ID传递给TM来进行指定事务的回滚,Fescar同样提供了一个默认的DefaultTransactionManager实现 */ status = transactionManager.rollback(xid); } #DefaultTransactionManager /** * 这里发起了一个同步调用,使用事务的XID组装了一个GlobalRollbackRequest,同时向Fescar-Server发起远程调用表示需要对XID这个事务进行全局回滚, * 在这边阻塞直到收到Fescar-Server执行完毕的回复,至此调用者的逻辑结束。 */ @Override public GlobalStatus rollback(String xid) throws TransactionException { long txId = XID.getTransactionId(xid); GlobalRollbackRequest globalRollback = new GlobalRollbackRequest(); globalRollback.setTransactionId(txId); GlobalRollbackResponse response = (GlobalRollbackResponse) syncCall(globalRollback); return response.getGlobalStatus(); }
TM 端很简单,
- 1.从RootContext中移除XID
- 2.构造GlobalRollbackRequest参数
- 3.向TC端发送rollback消息
主要逻辑继续跟踪TC对于rollback类型消息的处理
--
4.2.TC处理rollback
-
继续忽略接收消息的细节,跟commit消息触发逻辑一致。直接进入核心处理:
#DefaultCoordinator @Override protected void doGlobalRollback(GlobalRollbackRequest request, GlobalRollbackResponse response, RpcContext rpcContext) throws TransactionException { response.setGlobalStatus(core.rollback(XID.generateXID(request.getTransactionId()))); } #DefaultCore @Override public GlobalStatus rollback(String xid) throws TransactionException { GlobalSession globalSession = SessionHolder.findGlobalSession(XID.getTransactionId(xid)); if (globalSession == null) { return GlobalStatus.Finished; } GlobalStatus status = globalSession.getStatus(); globalSession.close(); // Highlight: Firstly, close the session, then no more branch can be registered. if (status == GlobalStatus.Begin) { globalSession.changeStatus(GlobalStatus.Rollbacking); doGlobalRollback(globalSession, false); } return globalSession.getStatus(); } #DefaultCore @Override public void doGlobalRollback(GlobalSession globalSession, boolean retrying) throws TransactionException { for (BranchSession branchSession : globalSession.getReverseSortedBranches()) { BranchStatus currentBranchStatus = branchSession.getStatus(); if (currentBranchStatus == BranchStatus.PhaseOne_Failed) { continue; } try { BranchStatus branchStatus = resourceManagerInbound.branchRollback(XID.generateXID(branchSession.getTransactionId()), branchSession.getBranchId(), branchSession.getResourceId(), branchSession.getApplicationData()); switch (branchStatus) { case PhaseTwo_Rollbacked: globalSession.removeBranch(branchSession); continue; case PhaseTwo_RollbackFailed_Unretryable: GlobalStatus currentStatus = globalSession.getStatus(); if (currentStatus.name().startsWith("Timeout")) { globalSession.changeStatus(GlobalStatus.TimeoutRollbackFailed); } else { globalSession.changeStatus(GlobalStatus.RollbackFailed); } globalSession.end(); return; default: if (!retrying) { queueToRetryRollback(globalSession); } return; } } catch (Exception ex) { ... } } GlobalStatus currentStatus = globalSession.getStatus(); if (currentStatus.name().startsWith("Timeout")) { globalSession.changeStatus(GlobalStatus.TimeoutRollbacked); } else { globalSession.changeStatus(GlobalStatus.Rollbacked); } globalSession.end(); } #DefaultCoordinator @Override public BranchStatus branchRollback(String xid, long branchId, String resourceId, String applicationData) throws TransactionException { try { BranchRollbackRequest request = new BranchRollbackRequest(); request.setXid(xid); request.setBranchId(branchId); request.setResourceId(resourceId); request.setApplicationData(applicationData); GlobalSession globalSession = SessionHolder.findGlobalSession(XID.getTransactionId(xid)); BranchSession branchSession = globalSession.getBranch(branchId); BranchRollbackResponse response = (BranchRollbackResponse)messageSender.sendSyncRequest(resourceId,branchSession.getClientId(), request); return response.getBranchStatus(); } catch (IOException e) { ... } }
TC端逻辑就是如此:
- 1.关闭session相关资源
- 2.更新globalSession对应的状态为GlobalStatus.Rollbacking
- 3.遍历globalSession中的branchSession,构造BranchRollbackRequest并进行向RM发送分支事务回滚的请求。
- 4.TC获取分支的回滚状态进行分支移除
- 5.TC最终更新全局事务状态
看了上面的逻辑,跟提交commit逻辑类似,但简单于从媒体,因为没有异步任务进行,而是直接遍历branchSession并发送到RM处理
那么具体的分支事务的执行,就紧接着回到了RM的逻辑处理了,继续转到RM处理commit逻辑。
--
4.3.RM处理TC传送的rollback
-
RM接受TC端消息依然是AbstractRpcRemoting中channelRead,之后进行消息的分发,这里我们在启动的rpc时,初始化的是AbstractRpcRemotingClient,因此进入以下逻辑:
#AbstractRpcRemotingClient @Override public void dispatch(long msgId, ChannelHandlerContext ctx, Object msg) { if (clientMessageListener != null) { String remoteAddress = NetUtil.toStringAddress(ctx.channel().remoteAddress()); clientMessageListener.onMessage(msgId, remoteAddress, msg, this); } }
继续看消息监听逻辑:
#RmMessageListener @Override public void onMessage(long msgId, String serverAddress, Object msg, ClientMessageSender sender) { if (LOGGER.isInfoEnabled()) { LOGGER.info("onMessage:" + msg); } if (msg instanceof BranchCommitRequest) { handleBranchCommit(msgId, serverAddress, (BranchCommitRequest)msg, sender); } else if (msg instanceof BranchRollbackRequest) { handleBranchRollback(msgId, serverAddress, (BranchRollbackRequest)msg, sender); } }
最终找到处理器:RMHandlerAT
#RMHandlerAT @Override protected void doBranchRollback(BranchRollbackRequest request, BranchRollbackResponse response) throws TransactionException { String xid = request.getXid(); long branchId = request.getBranchId(); String resourceId = request.getResourceId(); String applicationData = request.getApplicationData(); LOGGER.info("AT Branch rolling back: " + xid + " " + branchId + " " + resourceId); BranchStatus status = dataSourceManager.branchRollback(xid, branchId, resourceId, applicationData); response.setBranchStatus(status); LOGGER.info("AT Branch rollback result: " + status); } #DataSourceManager @Override public BranchStatus branchRollback(String xid, long branchId, String resourceId, String applicationData) throws TransactionException { DataSourceProxy dataSourceProxy = get(resourceId); if (dataSourceProxy == null) { throw new ShouldNeverHappenException(); } try { /** * 回滚的关键所在了 * * 整个回滚操作中最重要的就是UndoLogManager,在这里通过undolog记录的用于回滚的信息进行数据库回滚,Fescar-Server的回滚实现思路是根据INSERT,UPDATE和DELETE三种语句进行解析,反向生成用于回滚的SQL, * 具体实现可以参见fescar-rm-distribution项目中undo包中的MySQLUndoDeleteExecutor,MySQLUndoInsertExecutor和MySQLUndoUpdateExecutor, * 最终Fescar-Server会将回滚操作的结果组装成GlobalRollbackResponse返回给TM调用方,至此Fescar-Server的回滚逻辑完成。 */ UndoLogManager.undo(dataSourceProxy, xid, branchId); } catch (TransactionException te) { if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) { return BranchStatus.PhaseTwo_RollbackFailed_Unretryable; } else { return BranchStatus.PhaseTwo_RollbackFailed_Retryable; } } return BranchStatus.PhaseTwo_Rollbacked; }
回滚的关键所在了
UndoLogManagers是整个回滚操作中核心逻辑,通过之前execute执行时生成的undolog记录,Fescar-Server根据INSERT,UPDATE和DELETE三种语句进行解析,反向生成用于回滚的SQL,进而还原数据。
具体实现可以参见fescar-rm-distribution项目中undo包中的MySQLUndoDeleteExecutor,MySQLUndoInsertExecutor和MySQLUndoUpdateExecutor,
最终Fescar-Server会将回滚操作的结果组装成GlobalRollbackResponse返回给TM调用方,至此Fescar-Server的回滚逻辑完成。此时就是回滚的具体逻辑了。至此,rollback逻辑结束。
因此,fescar如果决议是全局提交,此时分支事务此时已经完成提交,不需要同步协调处理(只需要异步清理回滚日志),Phase2 可以非常快速地完成。借助官网的图示;如果是回滚,就直接根据undo日志生成回滚sql直接将执行后的数据还原。就是如下了:
至此,事务提交&回滚分析完成。
--
四.未完待续...
后续分析主要针对fescar 锁