一、Java Crash处理
1、在Thread类中有这样一个接口:UncaughtExceptionHandler。
通过查看相关注释可以知道:当线程由于未捕获的异常突然终止时,JVM会通过getUnaughtExceptionHandler查询线程的UnaughtExceptionHandler,并调用它的uncaughtException方法。如果未设置UncaughtExceptionHandler,系统会用ThreadGroup进行处理。
/**
* Interface for handlers invoked when a <tt>Thread</tt> abruptly
* terminates due to an uncaught exception.
* <p>When a thread is about to terminate due to an uncaught exception
* the Java Virtual Machine will query the thread for its
* <tt>UncaughtExceptionHandler</tt> using
* {@link #getUncaughtExceptionHandler} and will invoke the handler's
* <tt>uncaughtException</tt> method, passing the thread and the
* exception as arguments.
* If a thread has not had its <tt>UncaughtExceptionHandler</tt>
* explicitly set, then its <tt>ThreadGroup</tt> object acts as its
* <tt>UncaughtExceptionHandler</tt>. If the <tt>ThreadGroup</tt> object
* has no
* special requirements for dealing with the exception, it can forward
* the invocation to the {@linkplain #getDefaultUncaughtExceptionHandler
* default uncaught exception handler}.
*/
@FunctionalInterface
public interface UncaughtExceptionHandler {
/**
* Method invoked when the given thread terminates due to the
* given uncaught exception.
* <p>Any exception thrown by this method will be ignored by the
* Java Virtual Machine.
* @param t the thread
* @param e the exception
*/
void uncaughtException(Thread t, Throwable e);
}
查看ThreadGroup的uncaughtException,它会查询线程设置的UnaughtExceptionHandler,如果没有的话,只是进行打印处理,并没有退出操作。说明一定有其他地方对Thread设置了UnaughtExceptionHandler。
/**
* Called by the Java Virtual Machine when a thread in this
* thread group stops because of an uncaught exception, and the thread
* does not have a specific {@link Thread.UncaughtExceptionHandler}
* installed.
* <p>
* The <code>uncaughtException</code> method of
* <code>ThreadGroup</code> does the following:
* <ul>
* <li>If this thread group has a parent thread group, the
* <code>uncaughtException</code> method of that parent is called
* with the same two arguments.
* <li>Otherwise, this method checks to see if there is a
* {@linkplain Thread#getDefaultUncaughtExceptionHandler default
* uncaught exception handler} installed, and if so, its
* <code>uncaughtException</code> method is called with the same
* two arguments.
* <li>Otherwise, this method determines if the <code>Throwable</code>
* argument is an instance of {@link ThreadDeath}. If so, nothing
* special is done. Otherwise, a message containing the
* thread's name, as returned from the thread's {@link
* Thread#getName getName} method, and a stack backtrace,
* using the <code>Throwable</code>'s {@link
* Throwable#printStackTrace printStackTrace} method, is
* printed to the {@linkplain System#err standard error stream}.
* </ul>
* <p>
* Applications can override this method in subclasses of
* <code>ThreadGroup</code> to provide alternative handling of
* uncaught exceptions.
*
* @param t the thread that is about to exit.
* @param e the uncaught exception.
* @since JDK1.0
*/
public void uncaughtException(Thread t, Throwable e) {
if (parent != null) {
parent.uncaughtException(t, e);
} else {
Thread.UncaughtExceptionHandler ueh =
Thread.getDefaultUncaughtExceptionHandler();
if (ueh != null) {
ueh.uncaughtException(t, e);
} else if (!(e instanceof ThreadDeath)) {
System.err.print("Exception in thread \""
+ t.getName() + "\" ");
e.printStackTrace(System.err);
}
}
}
2、Thread的UncaughtExceptionHandler何时设置的?
通过AMS-Activity启动流程,我们可以知道App启动大概要经历以下步骤:
在RuntimeInit.commonInit()方法中,会通过Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler)) 设置异常处理的handler。
protected static final void commonInit() {
if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
/*
* set handlers; these apply to all threads in the VM. Apps can replace
* the default handler, but not the pre handler.
*/
LoggingHandler loggingHandler = new LoggingHandler();
RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
/*
* Install a time zone supplier that uses the Android persistent time zone system property.
*/
RuntimeHooks.setTimeZoneIdSupplier(() -> SystemProperties.get("persist.sys.timezone"));
/*
* Sets handler for java.util.logging to use Android log facilities.
* The odd "new instance-and-then-throw-away" is a mirror of how
* the "java.util.logging.config.class" system property works. We
* can't use the system property here since the logger has almost
* certainly already been initialized.
*/
LogManager.getLogManager().reset();
new AndroidConfig();
/*
* Sets the default HTTP User-Agent used by HttpURLConnection.
*/
String userAgent = getDefaultUserAgent();
System.setProperty("http.agent", userAgent);
/*
* Wire socket tagging to traffic stats.
*/
NetworkManagementSocketTagger.install();
initialized = true;
}
3、崩溃的源头:KillApplicationHandler
查看源码可知,在finally中,KillApplicationHandler主动杀死了进程。
private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
public void uncaughtException(Thread t, Throwable e) {
try {
ensureLogging(t, e);
if (mCrashing) return;
mCrashing = true;
if (ActivityThread.currentActivityThread() != null) {
ActivityThread.currentActivityThread().stopProfiling();
}
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} catch (Throwable t2) {
...
} finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
}
}
4、KillApplicationHandler中的其他操作
在uncaughtException中,通过AMS.handleApplicationCrash()做了进一步处理。通过addErrorToDropBox()在系统中记录日志,可以记录 java crash、native crash、anr等,日志目录是:/data/system/dropbox 。
public void handleApplicationCrash(IBinder app,
ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
ProcessRecord r = findAppProcess(app, "Crash");
final String processName = app == null ? "system_server"
: (r == null ? "unknown" : r.processName);
handleApplicationCrashInner("crash", r, processName, crashInfo);
}
void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
ApplicationErrorReport.CrashInfo crashInfo) {
...
addErrorToDropBox(
eventType, r, processName, null, null, null, null, null, null, crashInfo,
new Float(loadingProgress), incrementalMetrics, null);
mAppErrors.crashApplication(r, crashInfo);
}
5、Android 处理Java Crash的调用流程
未捕获的异常 -> JVM 触发调用 ->
KillApplicationHandler.uncaughtException {
try {
ActivityManager.getService().handleApplicationCrash(); // 交给AMS处理
} finally { // 退出App进程
Process.killProcess(Process.myPid());
System.exit(10);
}
}
-> AMS.handleApplicationCrash
-> AMS.handleApplicationCrashInner {
addErrorToDropBox(); // 系统记录崩溃日志
mAppErrors.crashApplication();
}
-> AppErrors.crashApplication
-> AppErrors.crashApplicationInner {
// 处理crash
if (!makeAppCrashingLocked()){
return;
}
// 展示崩溃弹窗
final Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;
mService.mUiHandler.sendMessage(msg);
// 处理弹窗结果,重启、退出等
int res = result.get(); // 阻塞
switch (res) {}
}
二、native crash处理
1、java层监听
由Binder(五)服务注册流程-发送注册请求可知:
手机开机后会启动system_server进程,然后调用SystemServer的main方法,在main方法中通过startBootstrapServices启动AMS。之后通过startOtherServices方法调用AMS的systemReady ,在systemReady的回调中,会通过 mActivityManagerService.startObservingNativeCrashes() 注册 native crash 的监听。
在NativeCrashListener的run方法中,开启了socket监听。
public void startObservingNativeCrashes() {
final NativeCrashListener ncl = new NativeCrashListener(this);
ncl.start();
}
final class NativeCrashListener extends Thread {
public void run() {
final byte[] ackSignal = new byte[1];
...
try {
FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(DEBUGGERD_SOCKET_PATH);
Os.bind(serverFd, sockAddr);
Os.listen(serverFd, 1);
Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);
while (true) {
FileDescriptor peerFd = null;
try {
peerFd = Os.accept(serverFd, null /* peerAddress */);
if (peerFd != null) {
consumeNativeCrashData(peerFd);
}
} catch (Exception e) {
...
} finally {
...
}
}
} catch (Exception e) {
...
}
}
}
2、native上报
native程序是动态链接程序,需要链接器才能跑起来,liner就是Android的链接器,查看linker_main.cpp。经过一系列调用 _linker_init -> _linker_init_post_relocation -> debuggerd_init 进入debuggerd_handler.cpp的debuggerd_init方法中。
/* This is the entry point for the linker, called from begin.S. This
* method is responsible for fixing the linker's own relocations, and
* then calling __linker_init_post_relocation().
*/
extern "C" ElfW(Addr) __linker_init(void* raw_args) {
...
ElfW(Addr) start_address = __linker_init_post_relocation(args);
return start_address;
}
static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) {
#ifdef __ANDROID__
debuggerd_callbacks_t callbacks = {
.get_abort_message = []() {
return g_abort_message;
},
.post_dump = ¬ify_gdb_of_libraries,
};
debuggerd_init(&callbacks);
#endif
}
在debuggerd_init方法中,注册了用于处理signal的debuggerd_signal_handler。
void debuggerd_init(debuggerd_callbacks_t* callbacks) {
...
struct sigaction action;
memset(&action, 0, sizeof(action));
sigfillset(&action.sa_mask);
action.sa_sigaction = debuggerd_signal_handler;
action.sa_flags = SA_RESTART | SA_SIGINFO;
// Use the alternate signal stack if available so we can catch stack overflows.
action.sa_flags |= SA_ONSTACK;
debuggerd_register_handlers(&action);
}
// /system/core/debuggerd/include/debuggerd/handler.h
static void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) {
sigaction(SIGABRT, action, nullptr);
sigaction(SIGBUS, action, nullptr);
sigaction(SIGFPE, action, nullptr);
sigaction(SIGILL, action, nullptr);
sigaction(SIGSEGV, action, nullptr);
#if defined(SIGSTKFLT)
sigaction(SIGSTKFLT, action, nullptr);
#endif
sigaction(SIGSYS, action, nullptr);
sigaction(SIGTRAP, action, nullptr);
sigaction(DEBUGGER_SIGNAL, action, nullptr);
}
在debuggerd_signal_handler中,会通过clone子线程启动crashdump,用于记录崩溃日志,等子线程执行完毕后,通过resend_signal kill掉当前进程。
static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {
...
// clone子线程启动crashdump
pid_t child_pid =
clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
&thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);
if (child_pid == -1) {
fatal_errno("failed to spawn debuggerd dispatch thread");
}
// 等待子线程启动
futex_wait(&thread_info.pseudothread_tid, -1);
// 等待子线程执行完毕
futex_wait(&thread_info.pseudothread_tid, child_pid);
...
if (info->si_signo == DEBUGGER_SIGNAL) {
...
} else {
// 重新发送信号
resend_signal(info);
}
}
static void resend_signal(siginfo_t* info) {
// Signals can either be fatal or nonfatal.
// For fatal signals, crash_dump will send us the signal we crashed with
// before resuming us, so that processes using waitpid on us will see that we
// exited with the correct exit status (e.g. so that sh will report
// "Segmentation fault" instead of "Killed"). For this to work, we need
// to deregister our signal handler for that signal before continuing.
if (info->si_signo != DEBUGGER_SIGNAL) {
signal(info->si_signo, SIG_DFL); // 设置成系统默认处理,会kill掉当前进程
int rc = syscall(SYS_rt_tgsigqueueinfo, __getpid(), __gettid(), info->si_signo, info);
if (rc != 0) {
fatal_errno("failed to resend signal during crash");
}
}
}
在crash_dump的main方法中,fork子进程与tombstoned通信,记录crash日志;并通知AMS native crash。
// /system/core/debuggerd/crash_dump.cpp
int main(int argc, char** argv) {
...
// fork子进程
pid_t forkpid = fork();
if (forkpid == -1) {
PLOG(FATAL) << "fork failed";
} else if (forkpid == 0) {
fork_exit_read.reset();
} else {
// 等待子进程处理完毕
fork_exit_write.reset();
char buf;
TEMP_FAILURE_RETRY(read(fork_exit_read.get(), &buf, sizeof(buf)));
_exit(0);
}
...
// 连接tombstoned,输出日志
{
ATRACE_NAME("tombstoned_connect");
LOG(INFO) << "obtaining output fd from tombstoned, type: " << dump_type;
g_tombstoned_connected =
tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type);
}
if (g_tombstoned_connected) {
if (TEMP_FAILURE_RETRY(dup2(g_output_fd.get(), STDOUT_FILENO)) == -1) {
PLOG(ERROR) << "failed to dup2 output fd (" << g_output_fd.get() << ") to STDOUT_FILENO";
}
} else {
unique_fd devnull(TEMP_FAILURE_RETRY(open("/dev/null", O_RDWR)));
TEMP_FAILURE_RETRY(dup2(devnull.get(), STDOUT_FILENO));
g_output_fd = std::move(devnull);
}
...
// 通知AMS
if (fatal_signal) {
// Don't try to notify ActivityManager if it just crashed, or we might hang until timeout.
if (thread_info[target_process].thread_name != "system_server") {
activity_manager_notify(target_process, signo, amfd_data);
}
}
...
// 通知tombstoned处理完毕
if (g_tombstoned_connected && !tombstoned_notify_completion(g_tombstoned_socket.get())) {
LOG(ERROR) << "failed to notify tombstoned of completion";
}
return 0;
}
三、崩溃优化(java层)
1、记录日志信息:
记录手机信息、内存信息、Crash日志、屏幕截图等
2、让崩溃更友好一些:
系统崩溃会直接闪退,可以通过自定义handler进行处理,重启App页面,减少直接退出App的场景。
需要注意的是,重启app时,需要退出原来的进程,防止出现其它问题。
Intent intent = new Intent(BaseApplication.this, MainActivity.class);
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK
| Intent.FLAG_ACTIVITY_CLEAR_TASK |
Intent.FLAG_ACTIVITY_RESET_TASK_IF_NEEDED);
if (intent.getComponent() != null) {
// 模拟从Launcher启动
intent.setAction(Intent.ACTION_MAIN);
intent.addCategory(Intent.CATEGORY_LAUNCHER);
}
BaseApplication.this.startActivity(intent);
android.os.Process.killProcess(android.os.Process.myPid());
System.exit(10);
3、不崩溃:
在crash过程中通过在主线程中重启looper,防止App崩溃。
原理:系统出现未捕捉的异常后,会将异常一层层向上抛,我们知道主线程开启了looper循环,异常会导致循环退出,最终通过jvm调用到uncaughtException()方法。此时在主线程中通过Looper.loop()重启loop,即可继续处理App中的各种事件。
注意:当在Activity展示过程中crash时,系统会出现黑屏。 可以通过hook替换ActivityThread.mH.mCallback,对Activity的生命周期进行try catch,如果有异常的话,直接关闭准备显示的Activity。
public class CrashHandler implements Thread.UncaughtExceptionHandler {
@Override
public void uncaughtException(@NonNull Thread thread, @NonNull Throwable ex) {
handleExceptionReocrd(ex); // 自动记录日志
try { // 交给用户记录日志
if (listener != null) listener.recordException(ex);
} catch (Throwable e) {
e.printStackTrace();
}
try { // 是否重启APP,重启APP,需要杀掉进程
if (listener != null && listener.restartApp()) return;
} catch (Exception e) {
Log.d(TAG, "uncaughtException->handleByUser:" + Log.getStackTraceString(e));
}
// 未重启,是否开启安全模式
if (safeModelEnable) {
enterSafeModel(thread);
} else if (mDefaultHandler != null) {
// 交给系统处理
Log.d(TAG, "uncaughtException 交给系统处理");
mDefaultHandler.uncaughtException(thread, ex);
} else {
// 没有系统的处理器,直接退出进程
Log.w(TAG, "uncaughtException 退出进程");
android.os.Process.killProcess(android.os.Process.myPid());
System.exit(10);
}
}
public void enterSafeModel(Thread thread) {
Log.w(CrashHandler.TAG, "setSafe--- thread-----" + thread.getName());
if (thread == Looper.getMainLooper().getThread()) {
while (true) { //开启一个循环
try {
Log.e(TAG, "safeMode: 检测到异常退出,开启looper");
Looper.loop();
} catch (Throwable e) {
Log.e(TAG, "safeMode: 检测到异常退出:" + Log.getStackTraceString(e));
}
}
}
}
}