自定义编码方式抓取页面
link:链接地址
encode:编码格式。例如:utf-8、gbk、iso8859-1。
public static String getHtml(String link, String encode) throws IOException {
URL url = new URL(link);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
InputStream is = conn.getInputStream();
InputStreamReader isr = new InputStreamReader(is, encode);
BufferedReader br = new BufferedReader(isr);
String line = null;
StringBuffer html = new StringBuffer();
while ((line = br.readLine()) != null) {
html.append(line);
html.append("\n\r");
}
is.close();
isr.close();
br.close();
return html.toString();
}
抓取gzip编码格式的页面
参数同上。
public static String getHtmlByGzip(String link, String encode) throws IOException {
URL url = new URL(link);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
GZIPInputStream gis = new GZIPInputStream(conn.getInputStream());
byte[] data = new byte[1024];
int len = 0;
StringBuffer html = new StringBuffer();
while ((len = (gis.read(data))) != 0) {
html.append(new String(data, 0, len, encode));
}
return html.toString();
}