HTTP是浏览器和web服务器之间通信的标准协议。HTTP规定了client与server之间建立连接的过程,client怎样从server请求数据,
server怎样回复这个请求,最终,这个连接是怎样关闭的。HTTP连接用TCP/IP协议用于数据传输。对于每个client到server的请求,
有如下四个步骤:
Making the connection
client建立到server的默认80端口的TCP连接,如果想指定其他端口,可以在URL中指定
Making a request
client给server发送一则消息请求特定URL的页面,这个请求的格式一般是这样:
GET /index.html HTTP/1.0
GET指定了请求的操作。这里请求的操作是让server返回一个资源。/index.html是一个标识从server请求的资源的相对URL。
这个资源假定是在接收请求的机器上的,因此没有必要在其前加上前缀http://www.thismachine.com/. HTTP/1.0是client能
解释的协议版本。请求以两个carriage return/linefeed对结尾(\r\n\r\n in java parlance),不管client和server端平台
的行是如何结束的。
虽然GET行是请求的所有内容,但是client请求也能包括其他信息,这会以如下形式出现:
Keyword: Value
最常见的关键字是Accept,它告诉server在client哪种数据能处理。比如说,如下的行说明client能处理4种MIME媒体类型,对
应HTML documents,plain text,JPEG和GIF images:
Accept: text/html, text/plain, image/gif, image/jpeg
User-Agent是另一个常见的keyword,它让server知道发送信息的浏览器类型,使server发送针对这种特定浏览器的优化的文件。
如下的行显示请求来自2.4版的Lynx浏览器:
User-Agent: Lynx/2.4 libwww/2.1.4
除了最老的第一代浏览器,所有的浏览器都包括一个Host域,它确定server的名字,这个域让web servers区分相同IP中的服务的
不同名的主机,示例如下:
Host: www.cafeaulait.org
最终,请求以一个空白行结束,两个 carriage return/linefeed对,\r\n\r\n.一个完整的请求可能如下所示:
GET /index.html HTTP/1.0
Accept: text/html, text/plain, image/gif, image/jpeg
User-Agent: Lynx/2.4 libwww/2.1.4
Host: www.cafeaulait.org
除了GET,还有其他几种请求类型。HEAD仅取回文件头,而不是实际数据。这在检查文件修改日期以判定本地缓存
是否有效的情况中很常见。POST发送form data到server,PUT上传资源到server,而DELETE则从server删除资源。
The response
server发送一个响应到client。响应以请求代码开始,之后是a header full of metadata,一个空行,请求的文档
或者error message.假定请求的文档找到了,响应可能如下:
HTTP/1.1 200 OK
Date: Mon, 15 Sep 2003 21:06:50 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Last-Modified: Tue, 15 Apr 2003 17:28:57 GMT
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Content-length: 107
The rest of the document goes here
第一行显示server使用的协议(HTTP/1.1),之后是响应代码。200 OK是最常见的响应代码,表示请求成功。表3-1是
HTTP 1.0响应代码的完全列表,HTTP 1.1往这个表增加了很多。其他的header lines显示server's time frame内的
请求的日期,server软件(Apache 2.0.40),文档最后更改日期,a promise:server会在它完成传输的时候关闭连接,
MIME content type,传输的文档的长度(不计算这个header),在这里是107字节。
Closing the connection
client或server或它们均关闭连接。因此,每个请求会用一个网络连接。如果client重连,server不维持先前连接或
它的结果的记录。一个不维持过去请求信息的协议称为:stateless;相比较而下,像ftp这样的stateful协议能在处理
连接关闭前处理很多请求。状态的缺失是HTTP的优点也是缺点。
表 3-1. HTTP 1.0 response codes Response code
Meaning
2xx Successful
Response codes between 200 and 299 indicate that the request was received, understood, and accepted.
200 OK
This is the most common response code. If the request used GET or POST, the requested data is contained in the response along with the usual headers. If the request used HEAD, only the header information is included.
201 Created
The server has created a data file at a URL specified in the body of the response. The web browser should now attempt to load that URL. This is sent only in response to POST requests.
202 Accepted
This rather uncommon response indicates that a request (generally from POST) is being processed, but the processing is not yet complete so no response can be returned. The server should return an HTML page that explains the situation to the user, provides an estimate of when the request is likely to be completed, and, ideally, has a link to a status monitor of some kind.
204 No Content
The server has successfully processed the request but has no information to send back to the client. This is usually the result of a poorly written form-processing program that accepts data but does not return a response to the user indicating that it has finished.
3xx Redirection
Response codes from 300 to 399 indicate that the web browser needs to go to a different page.
300 Multiple Choices
The page requested is available from one or more locations. The body of the response includes a list of locations from which the user or web browser can pick the most appropriate one. If the server prefers one of these locations, the URL of this choice is included in a Location header, which web browsers can use to load the preferred page.
301 Moved Permanently
The page has moved to a new URL. The web browser should automatically load the page at this URL and update any bookmarks that point to the old URL.
302 Moved Temporarily
This unusual response code indicates that a page is temporarily at a new URL but that the document's location will change again in the foreseeable future, so bookmarks should not be updated.
304 Not Modified
The client has performed a GET request but used the If-Modified-Since header to indicate that it wants the document only if it has been recently updated. This status code is returned because the document has not been updated. The web browser will now load the page from a cache.
4xx Client Error
Response codes from 400 to 499 indicate that the client has erred in some fashion, although the error may as easily be the result of an unreliable network connection as of a buggy or nonconforming web browser. The browser should stop sending data to the server as soon as it receives a 4xx response. Unless it is responding to a HEAD request, the server should explain the error status in the body of its response.
400 Bad Request
The client request to the server used improper syntax. This is rather unusual, although it is likely to happen if you're writing and debugging a client.
401 Unauthorized
Authorization, generally username and password controlled, is required to access this page. Either the username and password have not yet been presented or the username and password are invalid.
403 Forbidden
The server understood the request but is deliberately refusing to process it. Authorization will not help. One reason this occurs is that the client asks for a directory listing but the server is not configured to provide it, as shown in Figure 3-1.
404 Not Found
This most common error response indicates that the server cannot find the requested page. It may indicate a bad link, a page that has moved with no forwarding address, a mistyped URL, or something similar.
5xx Server Error
Response codes from 500 to 599 indicate that something has gone wrong with the server, and the server cannot fix the problem.
500 Internal Server Error
An unexpected condition occurred that the server does not know how to handle.
501 Not Implemented
The server does not have the feature that is needed to fulfill this request. A server that cannot handle POST requests might send this response to a client that tried to POST form data to it.
502 Bad Gateway
This response is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request.
503 Service Unavailable
The server is temporarily unable to handle the request, perhaps as a result of overloading or maintenance.
HTTP 1.1把响应的数量增加了一倍多。然而,200到299的响应代码总是表示成功,300到399的响应代码表示重定向,400到499表示
客户端错误。500到599表示服务端错误。
HTTP 1.0由RFC 1945描述;它并不是官方互联网标准,因为它最初由IETF之外的浏览器和server提供商开发的。HTTP 1.1是由W3C和
IETF的HTTP工作组开发的推荐标准。它提供client和server间更灵活更强大的通信能力。它扩展性也更强。它在RFC 2616中描述。
HTTP 1.0是协议的基础版本。所有目前的web servers和浏览器都支持。HTTP 1.1为HTTP 1.0增加了很多特性,但并没怎么改变底层的
设计和体系结构。
HTTP 1.1首要的改进是连接可重用。HTTP 1.0为每个请求打开一个新连接。实际上,在一个web session中打开和关闭连接的时间可能
会比传输数据的时间更长,尤其是在session中有很多小文档的情况。HTTP 1.1允许浏览器在单个连接中发送很多不同请求;连接在明确
关闭之前一会保持打开状态。请求和响应都是异步的。浏览器不需要等待第一个请求的响应到达再发送第二个第三个请求。然而,它仍然
保持一个client请求响应一个server response的形式。每个请求和响应形式都同以前一样。
HTTP 1.1还有很多其他改进。请求包含一个Host header域以便一个web server能服务不同URL的不同站点。servers和浏览器能交换压缩文件
和文档的particular byte ranges,它们都能减轻网络负载。HTTP 1.1设计得对代理server更易使用。HTTP 1.1是HTTP 1.0的超集,所以HTTP 1.1
web servers在与只支持HTTP 1.0的浏览器交互时没有困难,反之亦然。