序言
记录SystemServer启动过程以及crash后如何重启的。
流程
SystemServer 是由Zygote进程fork出来的位于ZygoteInit.java的main方法中。
if (startSystemServer) {
Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);
// {@code r == null} in the parent (zygote) process, and {@code r != null} in the
// child (system_server) process.
if (r != null) {
r.run();
return;
}
}
接下来我们分析下forkSystemServer这个方法:
/* Hardcoded command line to start the system server */
String args[] = { // 1
"--setuid=1000",
"--setgid=1000",
"--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,"
+ "1024,1032,1065,3001,3002,3003,3006,3007,3009,3010",
"--capabilities=" + capabilities + "," + capabilities,
"--nice-name=system_server",
"--runtime-args",
"--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
"com.android.server.SystemServer",
};
ZygoteArguments parsedArgs = null;
int pid;
try {
...
/* Request to fork the system server process */
pid = Zygote.forkSystemServer( // 2
parsedArgs.mUid, parsedArgs.mGid,
parsedArgs.mGids,
parsedArgs.mRuntimeFlags,
null,
parsedArgs.mPermittedCapabilities,
parsedArgs.mEffectiveCapabilities);
} catch (IllegalArgumentException ex) {
throw new RuntimeException(ex);
}
/* For child process */
if (pid == 0) {
if (hasSecondZygote(abiList)) {
waitForSecondaryZygote(socketName);
}
zygoteServer.closeServerSocket(); // 3
return handleSystemServerProcess(parsedArgs); // 4
}
代码1的地方,设置了system_server进程的uid、gid和groups(Process.java中有定义),以及进程名字"system_server",接着调用代码2处Zygote的7个参数的forkSystemServer来fork一个进程,由于fork出来的子进程拥有所有父进程的东西,所以这里的pid会返回两个值,如果这个值是fork出来的子进程的pid,那么就证明此时代码运行在Zygote进程,如果pid == 0 ,那就证明此时代码运行在systemsever进程。如果运行在SystemServer进程,SystemServer继承了Zygote进程的所有内容,但是SystemServer进程又不用Zygote进程中的Socket,所以必须close它,如代码3所示。
接下来分析下代码2和代码4的相关逻辑,首先我们来看下代码2的源码:
public static int forkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
...
int pid = nativeForkSystemServer(
uid, gid, gids, runtimeFlags, rlimits,
permittedCapabilities, effectiveCapabilities);
...
}
forkSystemServer方法又调用了nativeForkSystemServer方法,从名称上可以看出,它是一个native方法:
private static native int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);
我们看下它对应的jni方法。由于nativeForkSystemServer位于Zygote.java中,Zygote.java的路径为
frameworks/base/core/java/com/android/internal/os/,所以相对应的native方法位于frameworks/base/core/jni/中。而Zygote.java对应的jni的文件名是以包名+类名定义的,即com_android_internal_os_Zygote.cpp。而nativeForkSystemServer对应的jni方法的名字必须包括包名+类名+方法名,即
static jint com_android_internal_os_Zygote_nativeForkAndSpecialize(
JNIEnv* env, jclass, jint uid, jint gid, jintArray gids,
jint runtime_flags, jobjectArray rlimits,
jint mount_external, jstring se_info, jstring nice_name,
jintArray managed_fds_to_close, jintArray managed_fds_to_ignore, jboolean is_child_zygote,
jstring instruction_set, jstring app_data_dir) {
...
pid_t pid = ForkCommon(env, false, fds_to_close, fds_to_ignore);
...
}
nativeForkAndSpecialize又调用了ForkCommon方法,对应的实现如下:
// Utility routine to fork a process from the zygote.
static pid_t ForkCommon(JNIEnv* env, bool is_system_server,
const std::vector<int>& fds_to_close,
const std::vector<int>& fds_to_ignore) {
SetSignalHandlers();
...
pid_t pid = fork();
...
}
ForkCommon调用了两个重要的函数,一个是fork函数(它的作用是创建一个新的子进程这里fork出来的进程就是SystemServer进程),一个是SetSignalHandlers函数。
static void SetSignalHandlers() {
struct sigaction sig_chld = {};
sig_chld.sa_handler = SigChldHandler;
if (sigaction(SIGCHLD, &sig_chld, nullptr) < 0) {
ALOGW("Error setting SIGCHLD handler: %s", strerror(errno));
}
struct sigaction sig_hup = {};
sig_hup.sa_handler = SIG_IGN;
if (sigaction(SIGHUP, &sig_hup, nullptr) < 0) {
ALOGW("Error setting SIGHUP handler: %s", strerror(errno));
}
}
在SetSignalHandlers函数中调用SigChldHandler函数,此函数用来捕捉SigChld信号(SigChld属于linux的一种信号,在一个进程终止或者停止时,将SIGCHLD信号发送给其父进程。系统默认将忽略此信号。如果父进程希望被告知其子系统的这种状态,则应捕捉此信号。信号的捕捉函数中通常调用wait函数以取得进程ID和其终止状态),我们看下它的实现:
// This signal handler is for zygote mode, since the zygote must reap its children
static void SigChldHandler(int /*signal_number*/) {
...
while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
...
// If the just-crashed process is the system_server, bring down zygote
// so that it is restarted by init and system server will be restarted
// from there.
if (pid == gSystemServerPid) {
async_safe_format_log(ANDROID_LOG_ERROR, LOG_TAG,
"Exit zygote because system server (pid %d) has terminated", pid);
kill(getpid(), SIGKILL);
}
就像SIGCHLD信号的描述,SigChldHandler 利用一个死循环和一个waitpd函数来获取进程的ID和其终止状态,如果发现捕获的crash进程的pid是SystemServer进程,则通过getpid函数获取自己的pid,然后自己杀死自己。目的是同生共死,因为当Zygote进程死掉后,其父进程Init进程会检测到,就会重启其子进程Zygote进程,这样Zygote也会拉起SystemServer进程。
分析完了代码2 forkSystemServer的代码,我们再来看下代码4的handleSystemServerProcess的代码,实现如下:
/**
* Finish remaining work for the newly forked system server process.
*/
private static Runnable handleSystemServerProcess(ZygoteArguments parsedArgs) {
...
/*
* Pass the remaining arguments to SystemServer.
*/
return ZygoteInit.zygoteInit(parsedArgs.mTargetSdkVersion,
parsedArgs.mRemainingArgs, cl);
}
handleSystemServerProcess又调用了ZygoteInit的zygoteInit方法,如以上注释所言:handleSystemServerProcess是完成fork进程之后的工作,而ZygoteInit的zygoteInit方法是为了传递ZygoteArguments类型的mRemainingArgs变量内容给SystemServer,具体来看下zygoteInit的实现:
public static final Runnable zygoteInit(int targetSdkVersion, String[] argv,
ClassLoader classLoader) {
if (RuntimeInit.DEBUG) {
Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
}
Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
RuntimeInit.redirectLogStreams();
RuntimeInit.commonInit();
ZygoteInit.nativeZygoteInit(); // 5
return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); // 6
}
zygoteInit 方法接收三个参数,分别是targetSdkVersion,剩余参数,以及一个ClassLoder(对这个感兴趣可以返回上一个方法进行查看)。并且最终调用了代码5和代码6。代码5处执行的是一个native方法:
private static final native void nativeZygoteInit();
实现在AndroidRuntime.cpp文件里:
static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
{
gCurRuntime->onZygoteInit();
}
这里调用了AndroidRuntime的onZygoteInit方法
virtual void onZygoteInit()
{
sp<ProcessState> proc = ProcessState::self();
ALOGV("App process: starting thread pool.\n");
proc->startThreadPool();
}
这个方法定义在app_main.cpp中,proc是一个ProcessState类型的对象,这里调用startThreadPool函数来启动线程池,主要用来进行Binder进程间通信,这里就不做详细分析了。
我们重点来看代码6的逻辑实现:
protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
ClassLoader classLoader) {
...
// Remaining arguments are passed to the start class's static main
return findStaticMain(args.startClass, args.startArgs, classLoader);
}
applicationInit又调用了findStaticMain方法,而findStaticMain如注释所言是为了传递数据给SystemServer的main方法。
protected static Runnable findStaticMain(String className, String[] argv,
ClassLoader classLoader) {
Class<?> cl;
try {
cl = Class.forName(className, true, classLoader);
} catch (ClassNotFoundException ex) {
throw new RuntimeException(
"Missing class when invoking static main " + className,
ex);
}
Method m;
try {
m = cl.getMethod("main", new Class[] { String[].class }); // 7
} catch (NoSuchMethodException ex) {
throw new RuntimeException(
"Missing static main on " + className, ex);
} catch (SecurityException ex) {
throw new RuntimeException(
"Problem getting static main on " + className, ex);
}
int modifiers = m.getModifiers();
if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) { // 8
throw new RuntimeException(
"Main method is not public and static on " + className);
}
/*
* This throw gets caught in ZygoteInit.main(), which responds
* by invoking the exception's run() method. This arrangement
* clears up all the stack frames that were required in setting
* up the process.
*/
return new MethodAndArgsCaller(m, argv); // 9
}
代码7利用反射拿到了SystemServer类的main方法,代码8处校验main方法,代码9返回一个Runnable类型的MethodAndArgsCaller对象,对象里面保存了方法和其他参数以及一个run方法
static class MethodAndArgsCaller implements Runnable {
/** method to call */
private final Method mMethod;
/** argument array */
private final String[] mArgs;
public MethodAndArgsCaller(Method method, String[] args) {
mMethod = method;
mArgs = args;
}
public void run() {
try {
mMethod.invoke(null, new Object[] { mArgs }); // 10
} catch (IllegalAccessException ex) {
throw new RuntimeException(ex);
} catch (InvocationTargetException ex) {
Throwable cause = ex.getCause();
if (cause instanceof RuntimeException) {
throw (RuntimeException) cause;
} else if (cause instanceof Error) {
throw (Error) cause;
}
throw new RuntimeException(ex);
}
}
}
这个对象在在ZygoteInit.java的main方法中拿到,并执行这个run方法,即在执行10的时候,其实就调用了SystemServer类的main方法。那为什么不直接调用这个main方法,而是在这里返回一个对象呢,如注释所言:清理堆栈,即执行main方法之前看不到堆栈信息。而事实上,在调用main方法之前已经做了大量工作。再看下ZygoteInit.java的main方法:
if (startSystemServer) {
Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);
// {@code r == null} in the parent (zygote) process, and {@code r != null} in the
// child (system_server) process.
if (r != null) {
r.run();
return;
}
}
到此,SystemServer启动完成,而整个流程主要完成五件事情,分别是fork SystemServer进程、关闭SystemServer中的Socket、初始化Binder驱动程序以及调用SystemServer类的main方法,和处理SystemServer死亡后进行重启的相关工作。
后续
如果大家喜欢这篇文章,欢迎点赞!
如果想看更多 framework 方面的文章,欢迎关注!