简介

本文主要介绍基于Android 7.0 的IpReachabilityMonitor机制

IpReachabilityMonitor简介

* Monitors on-link IP reachability and notifies callers whenever any on-link addresses of interest appear to have become unresponsive.

//监视链路IP的可达性；无论任何时候，一旦所关注的链路地址变的不响应时，就会通知调用者。

This code does not concern itself with "why" a neighbour might have become unreachable. Instead, it primarily reacts to the kernel's notion of IP reachability for each of the neighbours we know to be critically important to normal network connectivity. As such, it is often "just the messenger":the neighbours about which it warns are already deemed by the kernel to have become unreachable.

//IpReachabilityMonitor的代码实现逻辑不关心为什么一个neighbour网络变为不可达。相反，他主要反映在内核层面中每一个neighbour网络的IP可达性概念或状态。我们知道这个IP可达性状态对于正常网络连接来说是关键重要的。因此，IpReachabilityMonitor通常仅是“信使”：它警告neighbours网络已经被内核kernel视为不可达。

IpReachabilityMonitor工作方式

How it works:

1. The "on-link neighbours of interest" found in a given LinkProperties instance are added to a "watch list" via #updateLinkProperties().This usually means all default gateways and any on-link DNS servers.

//1、在给定的LinkProperties实例中找到关注的链路neighbours网络，这些neighbours网络被通过updateLinkProperties()方法添加到一个"watch list"中。这通常意味着所有默认网关和链路DNS服务器。

2. We listen continuously for netlink neighbour messages (RTM_NEWNEIGH,RTM_DELNEIGH), watching only for neighbours in the watch list.

//2、我们连续不断地监听netlink消息（RTM_NEWNEIGH,RTM_DELNEIGH），仅仅关注在watch list中的neighbours网络。

- A neighbour going into NUD_REACHABLE, NUD_STALE, NUD_DELAY, and even NUD_PROBE is perfectly normal; we merely record the new state.

//一个neighbour网络进入NUD_REACHABLE, NUD_STALE, NUD_DELAY和NUD_PROBE状态，均是完全正常现象，我们只是记录最新的状态。

- A neighbour's entry may be deleted (RTM_DELNEIGH), for example due to garbage collection. This is not necessarily of immediate concern; we record the neighbour as moving to NUD_NONE.

//一个neighbour条目可能被删除（RTM_DELNEIGH），例如由于垃圾回收。这个没有必要进行立即关联。我们记录被搬移到NUD_NONE状态的neighbour网络。

- A neighbour transitioning to NUD_FAILED (for any reason) is critically important and is handled as described below in #4.

//一个neighbour网络被转移到NUD_FAILED（任何原因）是极其重要的，这个相应的操作将会在下面#4描述。

3. All on-link neighbours in the watch list can be forcibly "probed" by calling #probeAll(). This should be called whenever it is important to verify that critical neighbours on the link are still reachable, e.g. when roaming between BSSIDs.

//3、可以通过调用probeAll()方法来强制执行probed所有在watch list 中的链路neighbours网络。不管什么时候，对于想验证链路上关键neighbour网络依然可达，这个方法是重要的，应该被调用，比如当在不同的BSSID之间Roaming的时候。

- The kernel will send unicast ARP requests for IPv4 neighbours and unicast NS packets for IPv6 neighbours. The expected replies will likely be unicast.

//内核kernel将为IPv4 neighbour 网络发送单播ARP请求和为IPv6 neighbour网络发送单播NS数据包。预期的答复可能是单播。

- The forced probing is done holding a wakelock. The kernel may,however, initiate probing of a neighbor on its own, i.e. whenever a neighbour has expired from NUD_DELAY.

//强制probing 必须持有一个wakelock才可以执行。然而，内核kernel也许引发自己的一个neighbor
网络的probing，比如一个neighbour网络已经过期了（在进入NUD_DELAY状态后过期）。

- The kernel sends:

&1（/proc/sys/net/ipv{4,6}/neigh//ucast_solicit） number of probes (usually 3) every &2（/proc/sys/net/ipv{4,6}/neigh//retrans_time_ms）number of milliseconds (usually 1000ms).

This normally results in 3 unicast packets, 1 per second.

//内核Kernel每&2秒发送&1次Probe 单播请求。通常结果是每1秒发送3次单播请求。

- If no response is received to any of the probe packets, the kernel marks the neighbour as being in state NUD_FAILED, and the listening process in #2 will learn of it.

//如果对于所有的Probe 请求，都没有接受到响应，内核kernel会将neighbour网络标记为NUD_FAILED状态，在#2中描述的监听进程将会学习它。

4. We call the supplied Callback#notifyLost() function if the loss of a neighbour in NUD_FAILED would cause IPv4 or IPv6 configuration to become incomplete (a loss of provisioning).

//4、如果处于NUD_FAILED状态的neighbour网络的丢失将会导致Ipv4 或者Ipv6配置变得不完整(配置丢失)，我们将会调用提供的Callback#notifyLost()函数。

- For example, losing all our IPv4 on-link DNS servers (or losing our only IPv6 default gateway) constitutes a loss of IPv4 (IPv6) provisioning; Callback#notifyLost() would be called.

//比如丢失所有IPv4链路DNS服务器（或者丢失仅有的Ipv6默认网关）会构成Ipv4(Ipv6)配置丢失。

Callback#notifyLost()函数将会被调用。

- Since it can be non-trivial to reacquire certain IP provisioning state it may be best for the link to disconnect completely and reconnect afresh.

//因为重新获取特定的IP配置状态是非常重要的。最好的方式是完全断开连接，然后进行重连。

IpReachabilityMonitor实现

1、在给定的LinkProperties实例中找到关注的链路neighbours网络，这些neighbours网络被通过updateLinkProperties()方法添加到一个"watch list"中。这通常意味着所有默认网关和链路DNS服务器。

public void updateLinkProperties(LinkProperties lp)

mLinkProperties = new LinkProperties(lp);

     MapnewIpWatchList = new HashMap<>();

     final Listroutes = mLinkProperties.getRoutes();

//将处于链接状态的默认网关加入IpWatchList

    for (RouteInfo route : routes) {

            if (route.hasGateway()) {

                       InetAddress gw = route.getGateway();

                            if (isOnLink(routes, gw)) {

                                            newIpWatchList.put(gw, getNeighborStateLocked(gw);}} }

//将处于链接状态的DNS服务器加入IpWatchList

for (InetAddress nameserver : lp.getDnsServers()) {

           if (isOnLink(routes, nameserver)) {

                    newIpWatchList.put(nameserver, getNeighborStateLocked(nameserver));}}

mIpWatchList = newIpWatchList;

mIpWatchListVersion++;

2、我们连续不断地监听netlink消息（RTM_NEWNEIGH,RTM_DELNEIGH），仅仅关注在watch list中的neighbours网络。

public IpReachabilityMonitor(Context context, String ifName, Callback callback,

AvoidBadWifiTracker tracker) throws IllegalArgumentException {

...

//NetlinkSocketObserver类是用来建立Socket连接，不断地接受来自kernel的netlink消息，并并解析消息，判读是否加入IpWatchList中。具体信息可以参考下面NetlinkSocketObserver部分解析

mNetlinkSocketObserver = new NetlinkSocketObserver();

mObserverThread = new Thread(mNetlinkSocketObserver);

mObserverThread.start();

...

}

3、可以通过调用probeAll()方法来强制执行probed所有在watch list中的链路neighbours网络。不管什么时候，对于想验证链路上关键neighbour网络依然可达，这个方法是重要的，应该被调用，比如当在不同的BSSID之间Roaming的时候。

//将IpWatchList中Key集合取出来，然后通过调用probeNeighbor(int ifIndex, InetAddress ip)验证相应的neighbour是否可达。

public void probeAll() {

//将IpWatchList中Key集合赋值给ipProbeList

Set<InetAddress> ipProbeList = new HashSet();

ipProbeList.addAll(mIpWatchList.keySet());

for (InetAddress target : ipProbeList) {

final int returnValue = probeNeighbor(mInterfaceIndex, target);

/// M: Add UDP packet for ARP retry

//下面是MTK添加的策略，针对probeNeighbor失败的情况下，使用UDP重新触发一次ARP请求

if (returnValue != 0) {

// Use UDP broadcast to trigger ARP procedure.

sendUdpBroadcast(target);}}}

private static int probeNeighbor(int ifIndex, InetAddress ip)：

对于具体接口Index号上的给定Ip地址，使内核Kernel执行neighbour网络可达性检测（IPv4 ARP或 IPv6 ND）。

如果网络可达性检测请求成功传输到内核kernel，返回0; 其他返回一个非0的错误码。

private static int probeNeighbor(int ifIndex, InetAddress ip) {

...

//将ip地址和相应的接口号ifIndex封装到PROBE消息中

final byte[] msg = RtNetlinkNeighborMessage.newNewNeighborMessage(

1, ip, StructNdMsg.NUD_PROBE, ifIndex, null);

//创建NetlinkSocket，成功后进行连接kernel, 然后发送消息；

最后接收消息，并通过Netlink对消息进行解析。

try (NetlinkSocket nlSocket = new NetlinkSocket(OsConstants.NETLINK_ROUTE)) {

nlSocket.connectToKernel();

nlSocket.sendMessage(msg, 0, msg.length, IO_TIMEOUT);

final ByteBuffer bytes = nlSocket.recvMessage(IO_TIMEOUT);

// recvMessage() guaranteed to not return null if it did not throw.

final NetlinkMessage response = NetlinkMessage.parse(bytes);

。。。//后面依据不同的response 返回不同的值。

}}

4、如果处于NUD_FAILED状态的neighbour网络的丢失将会导致Ipv4 或者Ipv6配置变得不完整(配置丢失)，我们将会调用提供的Callback#notifyLost()函数。

private void handleNeighborLost(String msg) {

LinkProperties whatIfLp = new LinkProperties(mLinkProperties);

        for (Map.Entryentry : mIpWatchList.entrySet()) {

                     //遍历IpWatchList中的所有neighbor，寻找value状态为NUD_FAILED的

                     if (entry.getValue() != StructNdMsg.NUD_FAILED) {

                           continue;

                      }

//从IpWatchList找到了value状态为NUD_FAILED的neighbor, 获取其IP地址；

//遍历mLinkProperties中路由Routes的网关是否有跟该IP地址一致的，如果有，则从当前的whatIfLp 中移除相应的路由Routes。

ip = entry.getKey();

                  for (RouteInfo route : mLinkProperties.getRoutes()) {

                  if (ip.equals(route.getGateway())) {

                          whatIfLp.removeRoute(route);

                 }

//如果该IP地址不是IPv6 地址或者使能了避免badLink连接的性能，则从whatIfLp中移除Ip地址对应的DNS服务器。

if (avoidingBadLinks() || !(ip instanceof Inet6Address)) {

                           // We should do this unconditionally, but alas we cannot: b/31827713.

                           whatIfLp.removeDnsServer(ip);

                }

           }

//调用LinkProperties的compareProvisioning函数得到whatIfLp与原始的mLinkProperties的区别。

delta = LinkProperties.compareProvisioning(mLinkProperties, whatIfLp);

//如果delta值为ProvisioningChange.LOST_PROVISIONING，则回调函数notifyLost进行配置丢失的处理；该回调函数的实现是在IpManager中，详见IpManger中的分析。

          if (delta == ProvisioningChange.LOST_PROVISIONING) {

            mCallback.notifyLost(ip, logMsg);

         }

}

Android - NetlinkSocketObserver

NetlinkSocketObserver简介

在IpReachabilityMonitor类中有一个子类NetlinkSocketObserver类。

在IpReachiabilityMonitor 构造函数中会创建NetlinkSocketObserver对象，并对该对象进行封装，创建一个Thread类对象，并调用Thread类对象的Start方法，启动线程。

mNetlinkSocketObserver = new NetlinkSocketObserver();

mObserverThread = new Thread(mNetlinkSocketObserver);

mObserverThread.start();

NetlinkSocketObserver作用

NetlinkSocketObserver 类主要和Android Netlink 进行交互。

Android netlink :frameworks/base/services/net/java/android/net/netlink/...

该类的作用主要是建立NETLINK_ROUTE Socket，绑定相应的NetlinkSocketAddress。然后无限循环接收kernelReply信息并对该消息进行解析。

NetlinkSocketObserver实现

//通过实现Runnable接口创建线程，重写run方法；

该类的作用就是建立NETLINK_ROUTE Socket，绑定相应的NetlinkSocketAddress。然后无限循环接收kernelReply信息并对该消息进行解析。

关键接口实现：

Run（）

-->parseNetlinkMessageBuffer(byteBuffer, whenMs)

-->evaluateRtNetlinkNeighborMessage((RtNetlinkNeighborMessage) nlMsg, whenMs);

private final class NetlinkSocketObserver implements Runnable

{

//建立NETLINK_ROUTE Socket，绑定相应的NetlinkSocketAddress。然后无限循环接收kernelReply信息并对该消息进行解析。

public void run() {

//1、建立NETLINK_ROUTE Socket，并绑定bind 相应的NetlinkSocketAddress

setupNetlinkSocket();

//2、接收Kernel的Reply信息

byteBuffer = recvKernelReply();

//3、对接收到的KernelReply信息进行解析

parseNetlinkMessageBuffer(byteBuffer, whenMs);

}

//将ByteBuffer解析为NetlinkMessage格式消息，并判断消息类型，调用评估消息是否更新相应ipadress的IpWatchList。

private void parseNetlinkMessageBuffer(ByteBuffer byteBuffer, long whenMs)

/*

//1、通过调用NetlinkMessage类将ByteBuffer解析为NetlinkMessage格式消息

final NetlinkMessage nlMsg = NetlinkMessage.parse(byteBuffer);

//2、判断解析出现的NetlinkMessage消息类型，确保是RtNetlinkNeighborMessage消息才继续进行评估；如果是NetlinkErrorMessage或不是RtNetlinkNeighborMessage消息直接退出。

nlMsg instanceof NetlinkErrorMessage

nlMsg instanceof RtNetlinkNeighborMessage

//3、调用evaluateRtNetlinkNeighborMessage对RtNetlinkNeighborMessage消息进行评估。

evaluateRtNetlinkNeighborMessage((RtNetlinkNeighborMessage) nlMsg, whenMs);

*/

//依据从RtNetlinkNeighborMessage获取的msgType和nudState，对IpWatchList中相应的InetAddress的nudState 进行更新。

private void evaluateRtNetlinkNeighborMessage

(RtNetlinkNeighborMessage neighMsg, long whenMs)

/*

//1、从neighMsg取目的IP地址，并判断该IP地址是否在IpWatchList中，不在则直接退出函数返回

final InetAddress destination = neighMsg.getDestination();

//2、从neighMsg取msgType和nudState，并根据他们进行IpWatchList 的更新

final short msgType = neighMsg.getHeader().nlmsg_type;

final short nudState = ndMsg.ndm_state;

final short value =

(msgType == NetlinkConstants.RTM_DELNEIGH) ? StructNdMsg.NUD_NONE: nudState;

mIpWatchList.put(destination, value);

//3、如果nudState为NUD_FAILED状态，则执行handleNeighborLost动作

if (nudState == StructNdMsg.NUD_FAILED) {

handleNeighborLost(eventMsg); //这个详见IpReachiabilityMonitor的实现方式的第4部分。

*/

}

Android-IpReachabilityMonitor