最近难得有时间,可以看看平时经常用的牛逼的三方框架是怎么实现的,学习学习。比如okhttp ,眼下安卓开发 网络框架 okhttp+retrofit 已经成了标配。公司项目在使用过程中我做了一些封装的工作。虽然目前来说没出过什么问题,但不敢说自己封装的好。因为从自己角度来说从来没有认真的看过这个框架的轮子是怎么造的,更别提怎么跑起来的的,就导致了底气不足。所以,趁有时间赶紧充充电。
网络上的其他大家的分享
自己看之前也搜了很多帖子,依依拜读,也算收益颇丰。在此推荐一位安卓开发工程师的博客,写的思路比较清晰。
拆轮子系列:拆 OkHttp
同时也从这位兄弟这里盗来一张图(对 我懒)来开始我的抽丝剥茧。
下面就是okhttp网络请求的全过程的流程图。从我的角度解析,可以把这个图以中间为划分分为左右两个部分:左边的是对client的封装,右边则是对http协议的封装。说白了 左边就是个浏览器客户端,右边是http的面向对象封装:请求 响应 url 请求头 GET POST 请求头啊 响应头啊 balablabl
(你说对不对?)
抽丝Okhttp中的Http协议封装
网上太多人分析okhttp 的请求过程,源码解析了,看的吐了,觉得自己了然于胸了,但是看归看,总觉得缺点啥,后来在工作中慢慢体会了,到底缺啥:就是看了很多解析,都是别人家灌输给你的,接受了多少不一定,而且网上很多都互相抄袭导致千篇一律的,看的多了觉得自己会了,但遇到问题了还是不能快速解决。(哎吗 废话太多)。
所以综上,我决定还是要自己看看源码 ,一点一点的拜读人家的智慧结晶,扎实自己的基本功。http协议对网络请求来说算是基础知识或者是必须要了解的,但实际工作中很多人还是对他懵懵懂懂的。所以第一步先啃他了。
顺序着啃吧!第一步 URL
图右上 第一项URL ,在网络编程中,url这个名词是使用最频繁的。那他到底是啥玩意呢?同时要了解另一个名词 URI 一本书上是这么说的
与 URI(统一资源标识符)相比,我们更熟悉 URL(Uniform
Resource Locator,统一资源定位符)。URL 正是使用 Web 浏览器等
访问 Web 页面时需要输入的网页地址。比如,下图的 http://baidu.com/
就是 URL。
URI 是 Uniform Resource Identifier 的缩写。RFC2396 分别对这 3 个单
词进行了如下定义。
Uniform
规定统一的格式可方便处理多种不同类型的资源,而不用根据上下文
环境来识别资源指定的访问方式。另外,加入新增的协议方案(如
http: 或 ftp:)也更容易。
Resource
资源的定义是“可标识的任何东西”。除了文档文件、图像或服务(例
如当天的天气预报)等能够区别于其他类型的,全都可作为资源。另
外,资源不仅可以是单一的,也可以是多数的集合体。
Identifier
表示可标识的对象。也称为标识符。
综上所述,URI 就是由某个协议方案表示的资源的定位标识符。协议
方案是指访问资源所使用的协议类型名称。
采用 HTTP 协议时,协议方案就是 http。除此之外,还有 ftp、
25
mailto、telnet、file 等。标准的 URI 协议方案有 30 种左右,由隶属于
国际互联网资源管理的非营利社团 ICANN(Internet Corporation for
Assigned Names and Numbers,互联网名称与数字地址分配机构)的
IANA(Internet Assigned Numbers Authority,互联网号码分配局)管理
颁布。
IANA - Uniform Resource Identifier (URI) SCHEMES(统一资源
标识符方案)
http://www.iana.org/assignments/uri-schemes
URI 用字符串标识某一互联网资源,而 URL 表示资源的地点(互联
网上所处的位置)。可见 URL 是 URI 的子集。
所以url在整个体系中还是相当的重要的。那么在okhttp中,我们肯定可以找到他的实现类:okhttp3.HttpUrl
首先我不上来就贴代码,我要贴注释:(带有道云翻译的,意外不意外)
A uniform resource locator (URL) with a scheme of either http or https. Use this class to compose and decompose Internet addresses. For example, this code will compose and print a URL for Google search:
一个统一资源定位器(URL),带有http或https的方案。使用这个类来编写和分解Internet地址。例如,该代码将编写并打印一个用于谷歌搜索的URL:
HttpUrl url = new HttpUrl.Builder()
.scheme("https")
.host("www.google.com")
.addPathSegment("search")
.addQueryParameter("q", "polar bears")
.build();
System.out.println(url);
which prints:
https://www.google.com/search?q=polar%20bears
As another example, this code prints the human-readable query parameters of a Twitter search:
另一个例子是,该代码打印Twitter搜索的人类可读查询参数:
HttpUrl url = HttpUrl.parse("https://twitter.com/search?q=cute%20%23puppies&f=images");
for (int i = 0, size = url.querySize(); i < size; i++) {
System.out.println(url.queryParameterName(i) + ": " + url.queryParameterValue(i));
}
which prints:
q: cute #puppies
f: images
In addition to composing URLs from their component parts and decomposing URLs into their component parts, this class implements relative URL resolution: what address you'd reach by clicking a relative link on a specified page. For example:
除了从组件部分组成URL并将URL分解到组件部分之外,这个类还实现了相对URL解析:单击指定页面上的相对链接可以访问哪些地址。例如:
HttpUrl base = HttpUrl.parse("https://www.youtube.com/user/WatchTheDaily/videos");
HttpUrl link = base.resolve("../../watch?v=cbP2N1BQdYc");
System.out.println(link);
which prints:
https://www.youtube.com/watch?v=cbP2N1BQdYc
What's in a URL?
A URL has several components.
Scheme
Sometimes referred to as protocol, A URL's scheme describes what mechanism should be used to retrieve the resource. Although URLs have many schemes (mailto, file, ftp), this class only supports http and https. Use java.net.URI for URLs with arbitrary schemes.
有时称为协议,URL的方案描述了应该使用什么机制来检索资源。虽然url有许多方案(mailto、file、ftp),但这个类只支持http和https。对于带有任意方案的url使用java.net.URI。
Username and Password
Username and password are either present, or the empty string "" if absent. This class offers no mechanism to differentiate empty from absent. Neither of these components are popular in practice. Typically HTTP applications use other mechanisms for user identification and authentication.
用户名和密码要么是存在的,要么是空字符串。这个类没有提供任何机制来区分空的和不存在的。这些组件在实践中都不受欢迎。通常,HTTP应用程序使用其他机制来进行用户标识和身份验证。
Host
The host identifies the webserver that serves the URL's resource. It is either a hostname like square.com or localhost, an IPv4 address like 192.168.0.1, or an IPv6 address like ::1.
主机标识服务URL资源的webserver。它是一个主机名,像square.com或localhost,一个IPv4地址,如192.168.0.1,或者一个IPv6地址,比如::1。
Usually a webserver is reachable with multiple identifiers: its IP addresses, registered domain names, and even localhost when connecting from the server itself. Each of a webserver's names is a distinct URL and they are not interchangeable. For example, even if http://square.github.io/dagger and http://google.github.io/dagger are served by the same IP address, the two URLs identify different resources.
通常,一个webserver可以通过多个标识符访问:它的IP地址、注册域名,甚至在连接服务器本身时,也可以使用localhost。每个webserver的名称都是一个不同的URL,它们不能互换。例如,即使http://square.github.io /dagger和http://google.github。io/dagger由相同的IP地址提供,两个url标识不同的资源。
Port
The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme's default is used.
用于连接到web服务器的端口。默认情况下,HTTP是80,HTTPS是443。这个类永远不会返回-1对于端口:如果URL中没有显式指定端口,则使用scheme的默认值。
Path
The path identifies a specific resource on the host. Paths have a hierarchical structure like "/square/okhttp/issues/1486" and decompose into a list of segments like ["square", "okhttp", "issues", "1486"].
路径标识主机上的特定资源。路径有一个层次结构,像“/平方/ okhttp /问题/ 1486”和分解为一系列段(“广场”、“okhttp”,“问题”,“1486”)。
This class offers methods to compose and decompose paths by segment. It composes each path from a list of segments by alternating between "/" and the encoded segment. For example the segments ["a", "b"] build "/a/b" and the segments ["a", "b", ""] build "/a/b/".
这个类提供了通过分段组合和分解路径的方法。它通过在“/”和编码的段之间交替的方式从一个片段列表中组合出每个路径。例如,分段["a", "b"]建立"/a/b"和分段["a", "b", "]建立"/a/b/"。
If a path's last segment is the empty string then the path ends with "/". This class always builds non-empty paths: if the path is omitted it defaults to "/". The default path's segment list is a single empty string: [""].
如果路径的最后一个部分是空字符串,那么路径以“/”结束。这个类总是构建非空路径:如果路径被省略,则默认为“/”。默认路径的段列表是一个空字符串:["]。
Query
The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.
查询是可选的:它可以是空的、空的或非空的。对于许多HTTP url,查询字符串被细分为一个名称-值参数集合。这个类提供了将查询设置为单个字符串,或者作为单个名称-值参数的方法。使用名称-值参数,值是可选的,名称可以重复
Fragment
The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it's private to the client.
片段是可选的:它可以是空的、空的或非空的。与主机、端口、路径和查询不同,片段并没有发送到webserver:它是客户机的私有属性。
Encoding
Each component must be encoded before it is embedded in the complete URL. As we saw above, the string cute #puppies is encoded as cute%20%23puppies when used as a query parameter value.
每个组件必须在嵌入完整URL之前进行编码。正如我们在上面看到的,当被用作查询参数值时,字符串可爱的#小狗被编码为可爱的%20%23小狗。
Percent encoding
Percent encoding replaces a character (like 🍩) with its UTF-8 hex bytes (like %F0%9F%8D%A9). This approach works for whitespace characters, control characters, non-ASCII characters, and characters that already have another meaning in a particular context.
百分比编码用UTF-8十六进制字节(比如%F0%9F%8D%A9)替换一个字符(如)。这种方法适用于空白字符、控制字符、非ascii字符以及在特定上下文中已经具有其他含义的字符。
Percent encoding is used in every URL component except for the hostname. But the set of characters that need to be encoded is different for each component. For example, the path component must escape all of its ? characters, otherwise it could be interpreted as the start of the URL's query. But within the query and fragment components, the ? character doesn't delimit anything and doesn't need to be escaped.
除了主机名之外,每个URL组件都使用百分比编码。但是需要对每个组件进行编码的字符集是不同的。例如,路径组件必须脱逃所有的?字符,否则它可以被解释为URL查询的开始。但是在查询和片段组件中,?角色不限制任何东西,也不需要转义。
HttpUrl url = HttpUrl.parse("http://who-let-the-dogs.out").newBuilder()
.addPathSegment("_Who?_")
.query("_Who?_")
.fragment("_Who?_")
.build();
System.out.println(url);
This prints:
http://who-let-the-dogs.out/_Who%3F_?_Who?_#_Who?_
When parsing URLs that lack percent encoding where it is required, this class will percent encode the offending characters.
IDNA Mapping and Punycode encoding
Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.
当解析url时,在需要的地方缺少百分比编码时,这个类将会对有问题的字符进行编码。
IDNA映射和Punycode编码。
主机名有不同的需求,使用不同的编码方案。它由IDNA映射和Punycode编码组成。
In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid confusing characters. This includes basic case folding: transforming shouting SQUARE.COM into cool and casual square.com. It also handles more exotic characters. For example, the Unicode trademark sign (™) could be confused for the letters "TM" in http://ho™mail.com. To mitigate this, the single character (™) maps to the string (tm). There is similar policy for all of the 1.1 million Unicode code points. Note that some code points such as "🍩" are not mapped and cannot be used in a hostname.
为了避免混淆和阻止网络钓鱼攻击,IDNA映射会转换名称以避免混淆字符。这包括基本的案例折页:转换呼叫方。进入酷和休闲的square.com。它还可以处理更多的外来字符。例如,在http://ho mail.com中,Unicode商标标识()可能会被混淆为“TM”。减轻这一单一字符(™)映射到字符串(tm)。所有的110万个Unicode代码点都有类似的策略。注意,一些代码点如"🍩"不映射,hostname.不能使用
Punycode converts a Unicode string to an ASCII string to make international domain names work everywhere. For example, "σ" encodes as "xn--4xa". The encoded string is not human readable, but can be used with classes like InetAddress to establish connections.
Punycode将Unicode字符串转换为ASCII字符串,以使国际域名在任何地方都能工作。例如,“σ”编码为“xn - 4 xa”。编码的字符串不是人类可读的,但是可以使用像InetAddress这样的类来建立连接。
Why another URL model?
Java includes both java.net.URL and java.net.URI. We offer a new URL model to address problems that the others don't.
为什么另一个URL模型?
Java包括Java .net. url和Java .net. uri。我们提供了一个新的URL模型来解决其他问题。
Different URLs should be different
Although they have different content, java.net.URL considers the following two URLs equal, and the equals() method between them returns true:
不同的url应该是不同的。
尽管它们有不同的内容,但java.net.URL认为以下两个url相等,它们之间的equals()方法返回true:
http://square.github.io/
http://google.github.io/
This is because those two hosts share the same IP address. This is an old, bad design decision that makes java.net.URL unusable for many things. It shouldn't be used as a Map key or in a Set. Doing so is both inefficient because equality may require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are hosted.
这是因为这两个主机共享相同的IP地址。这是一个古老的、糟糕的设计决策,使得java.net.URL不能用于许多事情。它不应该被用作映射键或集合,这样做是低效的,因为相等可能需要DNS查找,并且不正确,因为不相等的url可能因为它们的托管方式而相等。
Equal URLs should be equal
These two URLs are semantically identical, but java.net.URI disagrees:
相等的url应该是相等的。
这两个url在语义上完全相同,但是java.net.URI不同意:
http://host:80/
http://host
Both the unnecessary port specification (:80) and the absent trailing slash (/) cause URI to bucket the two URLs separately. This harms URI's usefulness in collections. Any application that stores information-per-URL will need to either canonicalize manually, or suffer unnecessary redundancy for such URLs.
Because they don't attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you're building a webservice that checks that incoming paths are prefixed "/static/images/" before serving the corresponding assets from the filesystem.
不必要的端口规范(:80)和缺失的尾斜杠(/)导致URI将两个url分开。这会损害URI在集合中的有用性。任何存储信息/ url的应用程序都需要手动规范化,或者为这些url带来不必要的冗余。
因为它们不尝试规范形式,所以这些类很难安全地使用。假设您正在构建一个webservice,它检查传入的路径是否为前缀“/static/images/”,然后从文件系统中服务相应的资产。
String attack = "http://example.com/static/images/../../../../../etc/passwd";
System.out.println(new URL(attack).getPath());
System.out.println(new URI(attack).getPath());
System.out.println(HttpUrl.parse(attack).encodedPath());
By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that checks only the path prefix may suffer!
通过规范化输入路径,它们在目录遍历攻击中是串通的。只检查路径前缀的代码可能会受影响!
/static/images/../../../../../etc/passwd
/static/images/../../../../../etc/passwd
/etc/passwd
If it works on the web, it should work in your application
The java.net.URI class is strict around what URLs it accepts. It rejects URLs like "http://example.com/abc|def" because the '|' character is unsupported. This class is more forgiving: it will automatically percent-encode the '|', yielding "http://example.com/abc%7Cdef". This kind behavior is consistent with web browsers. HttpUrl prefers consistency with major web browsers over consistency with obsolete specifications.
如果它在web上工作,它应该在您的应用程序中工作。
uri类对其接受的url是严格的。它拒绝像“http://example.com/abc|def”这样的url,因为“|”字符不受支持。这个类更宽容:它会自动地对“|”编码,产生“http://example.com/abc%7Cdef”。这种行为与web浏览器是一致的。HttpUrl更喜欢与主流web浏览器的一致性,而不是与过时的规范一致。
Paths and Queries should decompose
Neither of the built-in URL models offer direct access to path segments or query parameters. Manually using StringBuilder to assemble these components is cumbersome: do '+' characters get silently replaced with spaces? If a query parameter contains a '&', does that get escaped? By offering methods to read and write individual query parameters directly, application developers are saved from the hassles of encoding and decoding.
路径和查询应该分解。
内置的URL模型都不能直接访问路径段或查询参数。手动使用StringBuilder来组装这些组件很麻烦:“+”字符会被空格代替吗?如果一个查询参数包含一个'&',那么它会被转义吗?通过提供直接读取和编写单个查询参数的方法,应用程序开发人员可以省去编码和解码的麻烦。
Plus a modern API
The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there's no API to compose a URI with a custom port without also providing a query and fragment.
Instances of HttpUrl are well-formed and always have a scheme, host, and path. With java.net.URL it's possible to create an awkward URL like http:/ with scheme and path but no hostname. Building APIs that consume such malformed values is difficult!
This class has a modern API. It avoids punitive checked exceptions: parse() returns null if the input is an invalid URL. You can even be explicit about whether each component has been encoded already.
加上现代的API
URL (JDK1.0)和URI (Java 1.4)类先于构建器,而使用伸缩构造函数。例如,没有API可以在没有提供查询和片段的情况下使用自定义端口组成URI。
HttpUrl的实例是格式良好的,并且总是有一个scheme、host和path。有了java.net.URL,就有可能创建一个类似http的尴尬URL:/使用scheme和path,但没有主机名。构建使用这种畸形值的api是很困难的!
这个类有一个现代的API。它避免了惩罚性检查异常:如果输入是无效的URL, parse()返回null。您甚至可以清楚地知道每个组件是否已经被编码。
这么多注释!!(不是笔者不厚道,就是因为注释多我才贴上来的)原因:看完注释(有道云神经网络翻译的,母语是中文都能看懂)很多东西不用我说了,说的很明白了
总结一下注释的内容:
- HttpUrl类 采用builder 的链式调用来构建url,确保url整体的字符串的安全规范。
- 内部定义了 Scheme ,Username and Password,Host,Port,Path,Query,Fragment等http协议中url的元素。
- 同时为了确保url字符串的合理规范,提供了 Percent encoding,IDNA映射和Punycode编码等工具方法。(话说之前用picasso显示图片的时候 遇到中午路径请求失败是怎么回事 ,按理说底层用okhttp 应该没问题啊,以后研究研究)
- java本身的net包中的URL类存在诸多问题(如上),HttpUrl着力解决了这些。
构造器
HttpUrl(Builder builder) {
this.scheme = builder.scheme;
this.username = percentDecode(builder.encodedUsername, false);
this.password = percentDecode(builder.encodedPassword, false);
this.host = builder.host;
this.port = builder.effectivePort();
this.pathSegments = percentDecode(builder.encodedPathSegments, false);
this.queryNamesAndValues = builder.encodedQueryNamesAndValues != null
? percentDecode(builder.encodedQueryNamesAndValues, true)
: null;
this.fragment = builder.encodedFragment != null
? percentDecode(builder.encodedFragment, false)
: null;
this.url = builder.toString();
}
关于HttpUrl类 ,暂时不用看其他的,大概看下他的构造器和他提供的功能,确保以后项目中用到能想起来他就可以。从构造器中我们可以看到,builder 中构建了 scheme host 等必备的数据,以及封装起来的queryNamesAndValues等。
另外这个类的方法有几个觉得比较有用的:
parse(java.lang.String url)
get(java.net.URI uri)|
get(java.net.URL url)
getChecked(java.lang.String url)
用来返回一个经过编码和验证的标准HttpUrl对象
isHttps()
判断是否是https请求
newBuilder()
取到一个新的builder
另外 tostring()则返回来一个标准的当前对象的url字符串
HttpUrl如何保证url的合法性
通过上面了解到,okhttp封装的url对java本身的URL有很多优越性,那么他是如何做到的呢。
首先我们从builder入手:
我们看到buider构建的所有传入方法都对参数做了为空判断。比如:
public Builder username(String username) {
if (username == null) throw new NullPointerException("username == null");
this.encodedUsername = canonicalize(username, USERNAME_ENCODE_SET, false, false, false, true);
return this;
}
public Builder encodedUsername(String encodedUsername) {
if (encodedUsername == null) throw new NullPointerException("encodedUsername == null");
this.encodedUsername = canonicalize(
encodedUsername, USERNAME_ENCODE_SET, true, false, false, true);
return this;
}
接下来 调用canonicalize();canonicalize 意为使其规范,就是是做对字符串进行编码使其符合url的规范。那我们来看这个方法做了什么。
/**
* Returns a substring of {@code input} on the range {@code [pos..limit)} with the following
* transformations:
* <ul>
* <li>Tabs, newlines, form feeds and carriage returns are skipped.
* <li>In queries, ' ' is encoded to '+' and '+' is encoded to "%2B".
* <li>Characters in {@code encodeSet} are percent-encoded.
* <li>Control characters and non-ASCII characters are percent-encoded.
* <li>All other characters are copied without transformation.
* </ul>
*
* @param alreadyEncoded true to leave '%' as-is; false to convert it to '%25'.
* @param strict true to encode '%' if it is not the prefix of a valid percent encoding.
* @param plusIsSpace true to encode '+' as "%2B" if it is not already encoded.
* @param asciiOnly true to encode all non-ASCII codepoints.
* @param charset which charset to use, null equals UTF-8.
*/
static String canonicalize(String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
int codePoint;
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (codePoint < 0x20
|| codePoint == 0x7f
|| codePoint >= 0x80 && asciiOnly
|| encodeSet.indexOf(codePoint) != -1
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))
|| codePoint == '+' && plusIsSpace) {
// Slow path: the character at i requires encoding!
Buffer out = new Buffer();
out.writeUtf8(input, pos, i);
canonicalize(out, input, i, limit, encodeSet, alreadyEncoded, strict, plusIsSpace,
asciiOnly, charset);
return out.readUtf8();
}
}
// Fast path: no characters in [pos..limit) required encoding.
return input.substring(pos, limit);
}
static void canonicalize(Buffer out, String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
Buffer encodedCharBuffer = null; // Lazily allocated.
int codePoint;
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (alreadyEncoded
&& (codePoint == '\t' || codePoint == '\n' || codePoint == '\f' || codePoint == '\r')) {
// Skip this character.
} else if (codePoint == '+' && plusIsSpace) {
// Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
out.writeUtf8(alreadyEncoded ? "+" : "%2B");
} else if (codePoint < 0x20
|| codePoint == 0x7f
|| codePoint >= 0x80 && asciiOnly
|| encodeSet.indexOf(codePoint) != -1
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))) {
// Percent encode this character.
if (encodedCharBuffer == null) {
encodedCharBuffer = new Buffer();
}
if (charset == null || charset.equals(Util.UTF_8)) {
encodedCharBuffer.writeUtf8CodePoint(codePoint);
} else {
encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset);
}
while (!encodedCharBuffer.exhausted()) {
int b = encodedCharBuffer.readByte() & 0xff;
out.writeByte('%');
out.writeByte(HEX_DIGITS[(b >> 4) & 0xf]);
out.writeByte(HEX_DIGITS[b & 0xf]);
}
} else {
// This character doesn't need encoding. Just copy it over.
out.writeUtf8CodePoint(codePoint);
}
}
}
static String canonicalize(String input, String encodeSet, boolean alreadyEncoded, boolean strict,
boolean plusIsSpace, boolean asciiOnly, Charset charset) {
return canonicalize(
input, 0, input.length(), encodeSet, alreadyEncoded, strict, plusIsSpace, asciiOnly,
charset);
}
static String canonicalize(String input, String encodeSet, boolean alreadyEncoded, boolean strict,
boolean plusIsSpace, boolean asciiOnly) {
return canonicalize(
input, 0, input.length(), encodeSet, alreadyEncoded, strict, plusIsSpace, asciiOnly, null);
}
还是通读注释,我们知道此方法是把传入的字符串进行url编码转化,在返回回来。涉及到的规则:
- 空格 制表符 回车 表单输入 会跳过不编码
- 在参数部分 ,空格串 ' '被编码成+ 而加号 + 被编码成 %2B
- 可以控制只允许ASCII码存在
*不需要编码的其余字符 原样复制不进行编码
我们首先看第一个方法:
static String canonicalize(String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
int codePoint;
//此循环对传入的字符串从pos 到limit逐一的进行判断
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (codePoint < 0x20//小于0x20 的字符,0x20表示空格 小于他的 也都是平时我们无法用肉眼看到的隐藏字符 如换行符 空格 等 所以属于不合法的无意义url字符
|| codePoint == 0x7f//删除键
|| codePoint >= 0x80 && asciiOnly//大于等于0x80超过ascii表范围并且asciiOnly所以需要编码
|| encodeSet.indexOf(codePoint) != -1(包含于encodeSet中指定必须编码)
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit)) //如果是百分号的话 根据规则判断
|| codePoint == '+' && plusIsSpace) {//如果是加号根据plusIsSpace规则判断
// Slow path: the character at i requires encoding!
// 这里是值得学习的 优化部分 :既然找到了第一个 需要编码的位置是 i 那就先把之前的字符先写到缓存,然后从i位置开始编码查找吧 这样避免了继续从零卡死对i之前的字符进行重复的操作
Buffer out = new Buffer();
out.writeUtf8(input, pos, i);
canonicalize(out, input, i, limit, encodeSet, alreadyEncoded, strict, plusIsSpace,
asciiOnly, charset);//下面解析
return out.readUtf8();
}
}
// Fast path: no characters in [pos..limit) required encoding.
//经过上面判断没有发现需要编码的字符,直接返回
return input.substring(pos, limit);
}
通过上一通查找判断确定需要编码的范围,接下来就是编码。
static void canonicalize(Buffer out, String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
Buffer encodedCharBuffer = null; // Lazily allocated. 延后申请内存提高性能
int codePoint;
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (alreadyEncoded
&& (codePoint == '\t' || codePoint == '\n' || codePoint == '\f' || codePoint == '\r')) {
// Skip this character. 这些回车 制表符等跳过 不处理
} else if (codePoint == '+' && plusIsSpace) {
// Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
//把加号 + 转成 %2B ,但如果已经经过编码 + 有可能来自空格转换过来的 就不需要再转换 直接写入 +
out.writeUtf8(alreadyEncoded ? "+" : "%2B");
} else if (codePoint < 0x20
|| codePoint == 0x7f
|| codePoint >= 0x80 && asciiOnly
|| encodeSet.indexOf(codePoint) != -1
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))) {
//经过和上面一样的判断 取出不不符合assii码和一些需要编码的字符进行百分比编码
// Percent encode this character.
if (encodedCharBuffer == null) {
encodedCharBuffer = new Buffer();
}
if (charset == null || charset.equals(Util.UTF_8)) {
encodedCharBuffer.writeUtf8CodePoint(codePoint);
} else {
encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset);
}
//百分比编码方式
while (!encodedCharBuffer.exhausted()) {
int b = encodedCharBuffer.readByte() & 0xff;
out.writeByte('%');
out.writeByte(HEX_DIGITS[(b >> 4) & 0xf]);
out.writeByte(HEX_DIGITS[b & 0xf]);
}
} else {
// This character doesn't need encoding. Just copy it over.
out.writeUtf8CodePoint(codePoint);
}
}
}
这大概就是整个百分比编码算法的核心。
除此之外 还提供两个几个直接进行百分比编码的方法如图:
还有一些合理性检验
如 判断scheme是不是http 或者https等等。
最后通过build方法构建出了一个完美的HttpUrl对象来。
好啦,大概就这样,笔者学习笔记欢迎指正和建议。
下面会开始关于method的解析。
下篇 抽丝剥茧 okhttp3 (二) https://www.jianshu.com/p/77f71946ef44