java基础传统爬虫——简单的Get请求

本文简介：

用org.apache.http.client.methods下的接口或类完成爬虫程序。

1. 爬虫的基本概念：

爬虫，是一种自动获取特定网页内容的程序。

2. 所需用到的类或接口

CloseableHttpResponse: 该接口实现HttpResponse接口和Closeable接口。前者用于当收到一个请求时，响应一个HTTP信息；后者用于关闭资源。
CloseableHttpClient: 该抽象类实现HttpClient接口和Closeable借口。前者用于代表http请求执行的最基本协议；后者用于关闭资源。
HttpGet: 该类继承于HttpRequestBase类。用于实现Get请求。
HttpEntity: 该接口用于创建一个可以在HTTP消息中被发送或被收到的实体。
EntityUtils: 是final class，有处理HttpEntitys的静态方法。

3. 小例子

Get请求：

public class Test{

    public static void main(String[] args) throws ClientProtocolException, IOException {
    
        CloseableHttpClient httpclient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet("http://www.baidu.com");
        CloseableHttpResponse response = httpclient.execute(httpGet);
    
        //输出响应头
        for(Header h: response.getAllHeaders())
            System.out.println(h.getName() + ":" + h.getValue());
    
        System.out.println("-----------------------------------------");
    
        HttpEntity entity1 = response.getEntity();
        //输出响应内容
        System.out.println(EntityUtils.toString(entity1));
    }
}

结果：

结果.png

IDE：Eclipse
jar包下载：http://hc.apache.org/downloads.cgi

最后编辑于：2017.11.27 03:43:37

java基础传统爬虫——简单的Get请求

本文简介：

1. 爬虫的基本概念：

2. 所需用到的类或接口

3. 小例子

推荐阅读更多精彩内容