抽丝剥茧 okhttp3 (一)

最近难得有时间,可以看看平时经常用的牛逼的三方框架是怎么实现的,学习学习。比如okhttp ,眼下安卓开发 网络框架 okhttp+retrofit 已经成了标配。公司项目在使用过程中我做了一些封装的工作。虽然目前来说没出过什么问题,但不敢说自己封装的好。因为从自己角度来说从来没有认真的看过这个框架的轮子是怎么造的,更别提怎么跑起来的的,就导致了底气不足。所以,趁有时间赶紧充充电。



拆轮子系列:拆 OkHttp
同时也从这位兄弟这里盗来一张图(对 我懒)来开始我的抽丝剥茧。
下面就是okhttp网络请求的全过程的流程图。从我的角度解析,可以把这个图以中间为划分分为左右两个部分:左边的是对client的封装,右边则是对http协议的封装。说白了 左边就是个浏览器客户端,右边是http的面向对象封装:请求 响应 url 请求头 GET POST 请求头啊 响应头啊 balablabl



网上太多人分析okhttp 的请求过程,源码解析了,看的吐了,觉得自己了然于胸了,但是看归看,总觉得缺点啥,后来在工作中慢慢体会了,到底缺啥:就是看了很多解析,都是别人家灌输给你的,接受了多少不一定,而且网上很多都互相抄袭导致千篇一律的,看的多了觉得自己会了,但遇到问题了还是不能快速解决。(哎吗 废话太多)。
所以综上,我决定还是要自己看看源码 ,一点一点的拜读人家的智慧结晶,扎实自己的基本功。http协议对网络请求来说算是基础知识或者是必须要了解的,但实际工作中很多人还是对他懵懵懂懂的。所以第一步先啃他了。

顺序着啃吧!第一步 URL

图右上 第一项URL ,在网络编程中,url这个名词是使用最频繁的。那他到底是啥玩意呢?同时要了解另一个名词 URI 一本书上是这么说的

与 URI(统一资源标识符)相比,我们更熟悉 URL(Uniform
Resource Locator,统一资源定位符)。URL 正是使用 Web 浏览器等
访问 Web 页面时需要输入的网页地址。比如,下图的 http://baidu.com/
就是 URL。
URI 是 Uniform Resource Identifier 的缩写。RFC2396 分别对这 3 个单
http: 或 ftp:)也更容易。
综上所述,URI 就是由某个协议方案表示的资源的定位标识符。协议
采用 HTTP 协议时,协议方案就是 http。除此之外,还有 ftp、
mailto、telnet、file 等。标准的 URI 协议方案有 30 种左右,由隶属于
国际互联网资源管理的非营利社团 ICANN(Internet Corporation for
Assigned Names and Numbers,互联网名称与数字地址分配机构)的
IANA(Internet Assigned Numbers Authority,互联网号码分配局)管理
IANA - Uniform Resource Identifier (URI) SCHEMES(统一资源
URI 用字符串标识某一互联网资源,而 URL 表示资源的地点(互联
网上所处的位置)。可见 URL 是 URI 的子集。



A uniform resource locator (URL) with a scheme of either http or https. Use this class to compose and decompose Internet addresses. For example, this code will compose and print a URL for Google search:

   HttpUrl url = new HttpUrl.Builder()
       .addQueryParameter("q", "polar bears")

which prints:


As another example, this code prints the human-readable query parameters of a Twitter search:

   HttpUrl url = HttpUrl.parse("https://twitter.com/search?q=cute%20%23puppies&f=images");
   for (int i = 0, size = url.querySize(); i < size; i++) {
     System.out.println(url.queryParameterName(i) + ": " + url.queryParameterValue(i));

which prints:

   q: cute #puppies
   f: images

In addition to composing URLs from their component parts and decomposing URLs into their component parts, this class implements relative URL resolution: what address you'd reach by clicking a relative link on a specified page. For example:

   HttpUrl base = HttpUrl.parse("https://www.youtube.com/user/WatchTheDaily/videos");
   HttpUrl link = base.resolve("../../watch?v=cbP2N1BQdYc");

which prints:


What's in a URL?
A URL has several components.
Sometimes referred to as protocol, A URL's scheme describes what mechanism should be used to retrieve the resource. Although URLs have many schemes (mailto, file, ftp), this class only supports http and https. Use java.net.URI for URLs with arbitrary schemes.

Username and Password
Username and password are either present, or the empty string "" if absent. This class offers no mechanism to differentiate empty from absent. Neither of these components are popular in practice. Typically HTTP applications use other mechanisms for user identification and authentication.

The host identifies the webserver that serves the URL's resource. It is either a hostname like square.com or localhost, an IPv4 address like, or an IPv6 address like ::1.

Usually a webserver is reachable with multiple identifiers: its IP addresses, registered domain names, and even localhost when connecting from the server itself. Each of a webserver's names is a distinct URL and they are not interchangeable. For example, even if http://square.github.io/dagger and http://google.github.io/dagger are served by the same IP address, the two URLs identify different resources.
通常,一个webserver可以通过多个标识符访问:它的IP地址、注册域名,甚至在连接服务器本身时,也可以使用localhost。每个webserver的名称都是一个不同的URL,它们不能互换。例如,即使http://square.github.io /dagger和http://google.github。io/dagger由相同的IP地址提供,两个url标识不同的资源。

The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme's default is used.

The path identifies a specific resource on the host. Paths have a hierarchical structure like "/square/okhttp/issues/1486" and decompose into a list of segments like ["square", "okhttp", "issues", "1486"].
路径标识主机上的特定资源。路径有一个层次结构,像“/平方/ okhttp /问题/ 1486”和分解为一系列段(“广场”、“okhttp”,“问题”,“1486”)。

This class offers methods to compose and decompose paths by segment. It composes each path from a list of segments by alternating between "/" and the encoded segment. For example the segments ["a", "b"] build "/a/b" and the segments ["a", "b", ""] build "/a/b/".
这个类提供了通过分段组合和分解路径的方法。它通过在“/”和编码的段之间交替的方式从一个片段列表中组合出每个路径。例如,分段["a", "b"]建立"/a/b"和分段["a", "b", "]建立"/a/b/"。

If a path's last segment is the empty string then the path ends with "/". This class always builds non-empty paths: if the path is omitted it defaults to "/". The default path's segment list is a single empty string: [""].

The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.
查询是可选的:它可以是空的、空的或非空的。对于许多HTTP url,查询字符串被细分为一个名称-值参数集合。这个类提供了将查询设置为单个字符串,或者作为单个名称-值参数的方法。使用名称-值参数,值是可选的,名称可以重复

The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it's private to the client.

Each component must be encoded before it is embedded in the complete URL. As we saw above, the string cute #puppies is encoded as cute%20%23puppies when used as a query parameter value.

Percent encoding
Percent encoding replaces a character (like 🍩) with its UTF-8 hex bytes (like %F0%9F%8D%A9). This approach works for whitespace characters, control characters, non-ASCII characters, and characters that already have another meaning in a particular context.

Percent encoding is used in every URL component except for the hostname. But the set of characters that need to be encoded is different for each component. For example, the path component must escape all of its ? characters, otherwise it could be interpreted as the start of the URL's query. But within the query and fragment components, the ? character doesn't delimit anything and doesn't need to be escaped.

   HttpUrl url = HttpUrl.parse("http://who-let-the-dogs.out").newBuilder()

This prints:


When parsing URLs that lack percent encoding where it is required, this class will percent encode the offending characters.
IDNA Mapping and Punycode encoding
Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.

In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid confusing characters. This includes basic case folding: transforming shouting SQUARE.COM into cool and casual square.com. It also handles more exotic characters. For example, the Unicode trademark sign (™) could be confused for the letters "TM" in http://homail.com. To mitigate this, the single character (™) maps to the string (tm). There is similar policy for all of the 1.1 million Unicode code points. Note that some code points such as "🍩" are not mapped and cannot be used in a hostname.
为了避免混淆和阻止网络钓鱼攻击,IDNA映射会转换名称以避免混淆字符。这包括基本的案例折页:转换呼叫方。进入酷和休闲的square.com。它还可以处理更多的外来字符。例如,在http://ho mail.com中,Unicode商标标识()可能会被混淆为“TM”。减轻这一单一字符(™)映射到字符串(tm)。所有的110万个Unicode代码点都有类似的策略。注意,一些代码点如"🍩"不映射,hostname.不能使用

Punycode converts a Unicode string to an ASCII string to make international domain names work everywhere. For example, "σ" encodes as "xn--4xa". The encoded string is not human readable, but can be used with classes like InetAddress to establish connections.
Punycode将Unicode字符串转换为ASCII字符串,以使国际域名在任何地方都能工作。例如,“σ”编码为“xn - 4 xa”。编码的字符串不是人类可读的,但是可以使用像InetAddress这样的类来建立连接。

Why another URL model?
Java includes both java.net.URL and java.net.URI. We offer a new URL model to address problems that the others don't.
Java包括Java .net. url和Java .net. uri。我们提供了一个新的URL模型来解决其他问题。

Different URLs should be different
Although they have different content, java.net.URL considers the following two URLs equal, and the equals() method between them returns true:


This is because those two hosts share the same IP address. This is an old, bad design decision that makes java.net.URL unusable for many things. It shouldn't be used as a Map key or in a Set. Doing so is both inefficient because equality may require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are hosted.

Equal URLs should be equal
These two URLs are semantically identical, but java.net.URI disagrees:


Both the unnecessary port specification (:80) and the absent trailing slash (/) cause URI to bucket the two URLs separately. This harms URI's usefulness in collections. Any application that stores information-per-URL will need to either canonicalize manually, or suffer unnecessary redundancy for such URLs.
Because they don't attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you're building a webservice that checks that incoming paths are prefixed "/static/images/" before serving the corresponding assets from the filesystem.

不必要的端口规范(:80)和缺失的尾斜杠(/)导致URI将两个url分开。这会损害URI在集合中的有用性。任何存储信息/ url的应用程序都需要手动规范化,或者为这些url带来不必要的冗余。

   String attack = "http://example.com/static/images/../../../../../etc/passwd";
   System.out.println(new URL(attack).getPath());
   System.out.println(new URI(attack).getPath());

By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that checks only the path prefix may suffer!


If it works on the web, it should work in your application
The java.net.URI class is strict around what URLs it accepts. It rejects URLs like "http://example.com/abc|def" because the '|' character is unsupported. This class is more forgiving: it will automatically percent-encode the '|', yielding "http://example.com/abc%7Cdef". This kind behavior is consistent with web browsers. HttpUrl prefers consistency with major web browsers over consistency with obsolete specifications.

Paths and Queries should decompose
Neither of the built-in URL models offer direct access to path segments or query parameters. Manually using StringBuilder to assemble these components is cumbersome: do '+' characters get silently replaced with spaces? If a query parameter contains a '&', does that get escaped? By offering methods to read and write individual query parameters directly, application developers are saved from the hassles of encoding and decoding.

Plus a modern API
The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there's no API to compose a URI with a custom port without also providing a query and fragment.

Instances of HttpUrl are well-formed and always have a scheme, host, and path. With java.net.URL it's possible to create an awkward URL like http:/ with scheme and path but no hostname. Building APIs that consume such malformed values is difficult!

This class has a modern API. It avoids punitive checked exceptions: parse() returns null if the input is an invalid URL. You can even be explicit about whether each component has been encoded already.

URL (JDK1.0)和URI (Java 1.4)类先于构建器,而使用伸缩构造函数。例如,没有API可以在没有提供查询和片段的情况下使用自定义端口组成URI。


这个类有一个现代的API。它避免了惩罚性检查异常:如果输入是无效的URL, parse()返回null。您甚至可以清楚地知道每个组件是否已经被编码。



  • HttpUrl类 采用builder 的链式调用来构建url,确保url整体的字符串的安全规范。
  • 内部定义了 Scheme ,Username and Password,Host,Port,Path,Query,Fragment等http协议中url的元素。
  • 同时为了确保url字符串的合理规范,提供了 Percent encoding,IDNA映射和Punycode编码等工具方法。(话说之前用picasso显示图片的时候 遇到中午路径请求失败是怎么回事 ,按理说底层用okhttp 应该没问题啊,以后研究研究)
  • java本身的net包中的URL类存在诸多问题(如上),HttpUrl着力解决了这些。


HttpUrl(Builder builder) {
    this.scheme = builder.scheme;
    this.username = percentDecode(builder.encodedUsername, false);
    this.password = percentDecode(builder.encodedPassword, false);
    this.host = builder.host;
    this.port = builder.effectivePort();
    this.pathSegments = percentDecode(builder.encodedPathSegments, false);
    this.queryNamesAndValues = builder.encodedQueryNamesAndValues != null
        ? percentDecode(builder.encodedQueryNamesAndValues, true)
        : null;
    this.fragment = builder.encodedFragment != null
        ? percentDecode(builder.encodedFragment, false)
        : null;
    this.url = builder.toString();

关于HttpUrl类 ,暂时不用看其他的,大概看下他的构造器和他提供的功能,确保以后项目中用到能想起来他就可以。从构造器中我们可以看到,builder 中构建了 scheme host 等必备的数据,以及封装起来的queryNamesAndValues等。


parse(java.lang.String url)
get(java.net.URI uri)|
get(java.net.URL url)
getChecked(java.lang.String url)



另外 tostring()则返回来一个标准的当前对象的url字符串



 public Builder username(String username) {
      if (username == null) throw new NullPointerException("username == null");
      this.encodedUsername = canonicalize(username, USERNAME_ENCODE_SET, false, false, false, true);
      return this;

    public Builder encodedUsername(String encodedUsername) {
      if (encodedUsername == null) throw new NullPointerException("encodedUsername == null");
      this.encodedUsername = canonicalize(
          encodedUsername, USERNAME_ENCODE_SET, true, false, false, true);
      return this;

接下来 调用canonicalize();canonicalize 意为使其规范,就是是做对字符串进行编码使其符合url的规范。那我们来看这个方法做了什么。

   * Returns a substring of {@code input} on the range {@code [pos..limit)} with the following
   * transformations:
   * <ul>
   *   <li>Tabs, newlines, form feeds and carriage returns are skipped.
   *   <li>In queries, ' ' is encoded to '+' and '+' is encoded to "%2B".
   *   <li>Characters in {@code encodeSet} are percent-encoded.
   *   <li>Control characters and non-ASCII characters are percent-encoded.
   *   <li>All other characters are copied without transformation.
   * </ul>
   * @param alreadyEncoded true to leave '%' as-is; false to convert it to '%25'.
   * @param strict true to encode '%' if it is not the prefix of a valid percent encoding.
   * @param plusIsSpace true to encode '+' as "%2B" if it is not already encoded.
   * @param asciiOnly true to encode all non-ASCII codepoints.
   * @param charset which charset to use, null equals UTF-8.
static String canonicalize(String input, int pos, int limit, String encodeSet,
      boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
      Charset charset) {
    int codePoint;
    for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
      codePoint = input.codePointAt(i);
      if (codePoint < 0x20
          || codePoint == 0x7f
          || codePoint >= 0x80 && asciiOnly
          || encodeSet.indexOf(codePoint) != -1
          || codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))
          || codePoint == '+' && plusIsSpace) {
        // Slow path: the character at i requires encoding!
        Buffer out = new Buffer();
        out.writeUtf8(input, pos, i);
        canonicalize(out, input, i, limit, encodeSet, alreadyEncoded, strict, plusIsSpace,
            asciiOnly, charset);
        return out.readUtf8();

    // Fast path: no characters in [pos..limit) required encoding.
    return input.substring(pos, limit);

  static void canonicalize(Buffer out, String input, int pos, int limit, String encodeSet,
      boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
      Charset charset) {
    Buffer encodedCharBuffer = null; // Lazily allocated.
    int codePoint;
    for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
      codePoint = input.codePointAt(i);
      if (alreadyEncoded
          && (codePoint == '\t' || codePoint == '\n' || codePoint == '\f' || codePoint == '\r')) {
        // Skip this character.
      } else if (codePoint == '+' && plusIsSpace) {
        // Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
        out.writeUtf8(alreadyEncoded ? "+" : "%2B");
      } else if (codePoint < 0x20
          || codePoint == 0x7f
          || codePoint >= 0x80 && asciiOnly
          || encodeSet.indexOf(codePoint) != -1
          || codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))) {
        // Percent encode this character.
        if (encodedCharBuffer == null) {
          encodedCharBuffer = new Buffer();

        if (charset == null || charset.equals(Util.UTF_8)) {
        } else {
          encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset);

        while (!encodedCharBuffer.exhausted()) {
          int b = encodedCharBuffer.readByte() & 0xff;
          out.writeByte(HEX_DIGITS[(b >> 4) & 0xf]);
          out.writeByte(HEX_DIGITS[b & 0xf]);
      } else {
        // This character doesn't need encoding. Just copy it over.

  static String canonicalize(String input, String encodeSet, boolean alreadyEncoded, boolean strict,
      boolean plusIsSpace, boolean asciiOnly, Charset charset) {
    return canonicalize(
        input, 0, input.length(), encodeSet, alreadyEncoded, strict, plusIsSpace, asciiOnly,

  static String canonicalize(String input, String encodeSet, boolean alreadyEncoded, boolean strict,
      boolean plusIsSpace, boolean asciiOnly) {
   return canonicalize(
        input, 0, input.length(), encodeSet, alreadyEncoded, strict, plusIsSpace, asciiOnly, null);


  • 空格 制表符 回车 表单输入 会跳过不编码
  • 在参数部分 ,空格串 ' '被编码成+ 而加号 + 被编码成 %2B
  • 可以控制只允许ASCII码存在
    *不需要编码的其余字符 原样复制不进行编码
static String canonicalize(String input, int pos, int limit, String encodeSet,
      boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
      Charset charset) {
    int codePoint;
  //此循环对传入的字符串从pos 到limit逐一的进行判断
    for (int i = pos; i < limit; i += Character.charCount(codePoint)) {

      codePoint = input.codePointAt(i);
      if (codePoint < 0x20//小于0x20 的字符,0x20表示空格 小于他的 也都是平时我们无法用肉眼看到的隐藏字符 如换行符 空格  等 所以属于不合法的无意义url字符
          || codePoint == 0x7f//删除键
          || codePoint >= 0x80 && asciiOnly//大于等于0x80超过ascii表范围并且asciiOnly所以需要编码
          || encodeSet.indexOf(codePoint) != -1(包含于encodeSet中指定必须编码)
          || codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit)) //如果是百分号的话 根据规则判断
          || codePoint == '+' && plusIsSpace) {//如果是加号根据plusIsSpace规则判断
        // Slow path: the character at i requires encoding!
// 这里是值得学习的 优化部分 :既然找到了第一个 需要编码的位置是 i 那就先把之前的字符先写到缓存,然后从i位置开始编码查找吧 这样避免了继续从零卡死对i之前的字符进行重复的操作
        Buffer out = new Buffer();
        out.writeUtf8(input, pos, i);
        canonicalize(out, input, i, limit, encodeSet, alreadyEncoded, strict, plusIsSpace,
            asciiOnly, charset);//下面解析
        return out.readUtf8();

    // Fast path: no characters in [pos..limit) required encoding.
    return input.substring(pos, limit);


static void canonicalize(Buffer out, String input, int pos, int limit, String encodeSet,
      boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
      Charset charset) {
    Buffer encodedCharBuffer = null; // Lazily allocated. 延后申请内存提高性能
    int codePoint;
    for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
      codePoint = input.codePointAt(i);
      if (alreadyEncoded
          && (codePoint == '\t' || codePoint == '\n' || codePoint == '\f' || codePoint == '\r')) {
        // Skip this character. 这些回车 制表符等跳过 不处理
      } else if (codePoint == '+' && plusIsSpace) {
        // Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
//把加号 + 转成 %2B ,但如果已经经过编码 + 有可能来自空格转换过来的 就不需要再转换 直接写入 +
        out.writeUtf8(alreadyEncoded ? "+" : "%2B");
      } else if (codePoint < 0x20
          || codePoint == 0x7f
          || codePoint >= 0x80 && asciiOnly
          || encodeSet.indexOf(codePoint) != -1
          || codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))) {
//经过和上面一样的判断 取出不不符合assii码和一些需要编码的字符进行百分比编码
        // Percent encode this character.
        if (encodedCharBuffer == null) {
          encodedCharBuffer = new Buffer();

        if (charset == null || charset.equals(Util.UTF_8)) {
        } else {
          encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset);
        while (!encodedCharBuffer.exhausted()) {
          int b = encodedCharBuffer.readByte() & 0xff;
          out.writeByte(HEX_DIGITS[(b >> 4) & 0xf]);
          out.writeByte(HEX_DIGITS[b & 0xf]);
      } else {
        // This character doesn't need encoding. Just copy it over.

除此之外 还提供两个几个直接进行百分比编码的方法如图:


如 判断scheme是不是http 或者https等等。


下篇 抽丝剥茧 okhttp3 (二) https://www.jianshu.com/p/77f71946ef44

