对于三方库NMSSH使用过程中,shell连接断开再重新startShell时,经常会遇到崩溃,
崩溃位置一般在:
rc = libssh2_channel_read(self.channel, buffer, (ssize_t)sizeof(buffer));
erc = libssh2_channel_read_stderr(self.channel, buffer, (ssize_t)sizeof(buffer));
// 或
while ((rc = libssh2_channel_shell(self.channel)) == LIBSSH2_ERROR_EAGAIN) {
waitsocket(CFSocketGetNative([self.session socket]), [self.session rawSession]);
}
典型场景:执行 shell 命令时强杀 app,立即重启并重新连接 shell,崩溃概率较高。
现在分析一下具体原因,并且从NMSSH和业务代码两方面,做出优化:
整理崩溃问题的分析与修复过程,生成文章。
根本原因分析
1. 竞态条件:dispatch_source handler 与关闭流程冲突
startShell 创建 dispatch_source 后,事件 handler 在后台线程持续读取 self.channel:
dispatch_source_set_event_handler(source, ^{
while (self.channel != NULL) {
rc = libssh2_channel_read(self.channel, buffer, ...);
erc = libssh2_channel_read_stderr(self.channel, buffer, ...);
// ...
}
});
同时,closeShell 或 closeChannel 可能在其他线程释放底层 LIBSSH2_CHANNEL:
- (void)closeChannel {
libssh2_channel_free(self.channel); // 释放内存
[self setChannel:NULL]; // 后置空指针
}
问题:libssh2_channel_free 执行后、self.channel 置空前,handler 可能仍在使用已释放的指针,导致 EXC_BAD_ACCESS。
2. 多线程访问 channel 指针缺乏保护
-
startShell在主线程或业务队列调用 -
closeShell在 UI 线程或不同队列调用 -
dispatch_sourcehandler 在shellEventQueue执行 -
writeData、requestSizeWidth等在不同线程调用
所有路径都直接访问 self.channel,没有同步机制。
3. 业务层调用未串行化
SSHClient.closeShell() 直接调用 channel.closeShell(),未确保在串行队列执行:
func closeShell() {
channel.closeShell() // 可能在任意线程调用
}
UI 线程与后台队列同时操作 NMSSHChannel,放大竞态风险。
修复方案
第一阶段:添加线程安全机制
1.1 引入锁和专用队列
在 NMSSHChannel 初始化中添加:
@property (nonatomic, strong) NSRecursiveLock *channelLock;
@property (nonatomic, strong) dispatch_queue_t shellEventQueue;
- (instancetype)initWithSession:(NMSSHSession *)session {
// ...
_channelLock = [[NSRecursiveLock alloc] init];
_channelLock.name = @"com.nmssh.channel.lock";
_shellEventQueue = dispatch_queue_create("com.nmssh.channel.shell", DISPATCH_QUEUE_SERIAL);
// ...
}
1.2 先置空再释放
修改 closeChannel,先保存指针并置空,再释放:
- (void)closeChannel {
LIBSSH2_CHANNEL *channel = NULL;
[self.channelLock lock];
@try {
channel = self.channel;
if (!channel) {
return;
}
[self setChannel:NULL]; // 先置空
[self setType:NMSSHChannelTypeClosed];
}
@finally {
[self.channelLock unlock];
}
// 在锁外释放,避免死锁
if (channel) {
libssh2_channel_close(channel);
libssh2_channel_wait_closed(channel);
libssh2_channel_free(channel);
}
}
1.3 保护所有 channel 访问
所有涉及 self.channel 的操作都加锁:
- (BOOL)openChannel:(NSError *__autoreleasing *)error {
[self.channelLock lock];
@try {
// 所有 channel 操作
LIBSSH2_CHANNEL *channel = libssh2_channel_open_session(...);
[self setChannel:channel];
// ...
}
@finally {
[self.channelLock unlock];
}
}
第二阶段:优化事件处理逻辑
2.1 Shell 事件 handler 加锁保护
修改 startShell 中的事件 handler,在锁内获取 channel 指针并读取:
dispatch_source_set_event_handler(source, ^{
__strong typeof(weakSelf) strongSelf = weakSelf;
if (!strongSelf) return;
while (YES) {
LIBSSH2_CHANNEL *channel = nil;
NSData *stdoutData = nil;
NSData *stderrData = nil;
BOOL shouldClose = NO;
BOOL hasData = NO;
[strongSelf.channelLock lock];
channel = strongSelf.channel; // 在锁内获取指针
if (!channel) {
[strongSelf.channelLock unlock];
return;
}
// 在锁内执行 libssh2 调用
ssize_t rc = libssh2_channel_read(channel, buffer, sizeof(buffer));
ssize_t erc = libssh2_channel_read_stderr(channel, errorBuffer, sizeof(errorBuffer));
// 处理数据...
if (rc > 0) {
stdoutData = [[NSData alloc] initWithBytes:buffer length:rc];
hasData = YES;
}
if (libssh2_channel_eof(channel) == 1) {
shouldClose = YES;
}
[strongSelf.channelLock unlock]; // 解锁后再回调 delegate
// 在锁外回调 delegate,避免死锁
if (stdoutData && strongSelf.delegate) {
[strongSelf.delegate channel:strongSelf didReadData:...];
}
if (shouldClose) {
[strongSelf closeShell];
return;
}
if (!hasData) {
break;
}
}
});
2.2 保护 libssh2_channel_shell 调用
startShell 中调用 libssh2_channel_shell 时加锁:
while (YES) {
[self.channelLock lock];
LIBSSH2_CHANNEL *channel = self.channel;
if (!channel) {
[self.channelLock unlock];
rc = LIBSSH2_ERROR_CHANNEL_FAILURE;
break;
}
rc = libssh2_channel_shell(channel); // 在锁内调用
[self.channelLock unlock];
if (rc != LIBSSH2_ERROR_EAGAIN) {
break;
}
waitsocket(...);
}
2.3 优化 closeShell 流程
先取消 source,再发送 EOF,最后关闭 channel:
- (void)closeShell {
dispatch_source_t sourceToCancel = nil;
BOOL wasShell = NO;
[self.channelLock lock];
@try {
sourceToCancel = self.source;
if (sourceToCancel) {
[self setSource:nil]; // 先置空,防止 handler 继续执行
}
wasShell = (self.type == NMSSHChannelTypeShell);
}
@finally {
[self.channelLock unlock];
}
if (sourceToCancel) {
dispatch_source_cancel(sourceToCancel); // 取消事件源
}
if (wasShell) {
libssh2_session_set_blocking(self.session.rawSession, 1);
[self sendEOF];
}
[self closeChannel];
}
第三阶段:业务层串行化
3.1 SSHClient 添加队列标识
在 SSHClient 中添加队列特定键,用于检测是否已在目标队列:
class SSHClient: NSObject {
let queue = DispatchQueue(label: "ssh.operations", qos: .userInitiated)
let queueSpecificKey = DispatchSpecificKey<Void>()
init(...) {
// ...
queue.setSpecific(key: queueSpecificKey, value: ())
}
}
3.2 closeShell 方法串行化
修改 closeShell,确保在串行队列执行:
func closeShell(completion: (() -> Void)? = nil) {
let executeClosure = { [weak self] in
guard let self = self else { return }
self.channel.closeShell()
print("SSH: Shell 已关闭")
if let completion = completion {
DispatchQueue.main.async {
completion()
}
}
}
// 如果已在队列中,直接执行;否则异步派发
if DispatchQueue.getSpecific(key: queueSpecificKey) != nil {
executeClosure()
} else {
queue.async(execute: executeClosure)
}
}
3.3 其他方法统一串行化
writeData、requestSizeWidth 等方法也确保在锁保护下访问 channel:
- (BOOL)writeData:(NSData *)data error:(NSError *__autoreleasing *)error timeout:(NSNumber *)timeout {
[self.channelLock lock];
@try {
if (self.type != NMSSHChannelTypeShell) {
return NO;
}
LIBSSH2_CHANNEL *channel = self.channel;
if (!channel) {
return NO;
}
// 在锁内执行写入操作
while ((rc = libssh2_channel_write(channel, [data bytes], [data length])) == LIBSSH2_ERROR_EAGAIN) {
[self.channelLock unlock];
waitsocket(...);
[self.channelLock lock];
channel = self.channel;
if (!channel) {
return NO;
}
}
// ...
}
@finally {
[self.channelLock unlock];
}
}
修复效果
修复后:
- 消除竞态:所有
LIBSSH2_CHANNEL访问都在锁保护下,handler 与关闭流程不再冲突 - 内存安全:先置空指针再释放内存,避免访问已释放对象
- 线程安全:业务层统一在串行队列操作,避免多线程同时访问
- 稳定性提升:经过测试,强杀 app 后立即重连不再崩溃
关键要点
- 先置空再释放:避免“使用已释放指针”的窗口期
- 锁粒度:在锁内获取指针并调用 libssh2 API,解锁后再回调 delegate
- 队列串行化:业务层统一在串行队列操作,避免跨线程竞态
- 事件源管理:关闭时先取消
dispatch_source,再清理资源
总结
通过引入 NSRecursiveLock、专用串行队列、先置空再释放的策略,以及业务层统一串行化,解决了 NMSSH Channel 在多线程环境下的崩溃问题。该方案在保持性能的同时,确保了线程安全和内存安全。