Headless Chrome初探（一）

原文地址：https://linux.cn/article-8850-1.html
在 Chrome 59　中开始搭载 Headless Chrome 。这是一种在无需显示的环境下运行 Chrome 浏览器的方式。
从本质上来说，就是不用 chrome 浏览器来运行 Chrome 的功能！它将 Chromium 和 Blink 渲染引擎提供的所有现代 Web 平台的功能都带入了命令行。Headless 浏览器对于自动化测试和不需要可视化 UI 界面的服务器环境是一个很好的工具。例如，你可能需要对真实的网页运行一些测试，创建一个 PDF，或者只是检查浏览器如何呈现 URL。

一、Hello World

Mac 和 Linux 上的 Chrome 59以上版本可用
Win的Chrome60以上版本可用
在Chrome安装路径下执行命令，Win中运行，命令行中需要添加 --disable-gpu

chrome \
  --headless \                   # 在headless模式运行Chrome
  --disable-gpu \                # 在Win上运行时需要
  --remote-debugging-port=9222 \
  https://www.mafengwo.cn   # 打开URL. 默认为about:blank

二、常用的命令

1、打印DOM:将 document.body.innerHTML 打印出来：
命令：--dump -dom
示例：chrome --headless --disable-gpu --dump-dom https://www.mafengwo.cn

2、将页面转换为PDF
命令：--pirint-to-pdf
示例：chrome --headless --disable-gpu --print-to-pdf https://www.mafengwo.cn
日志参考下图：出现Written to file output.pdf.说明打印成功

$ chrome --headless --disable-gpu --print-to-pdf https://www.mafengwo.cn
[0822/192626.743981:INFO:headless_shell.cc(572)] Written to file output.pdf.

3、截屏：将文件生成png格式图片
命令：--screenshot
示例：

chrome --headless --disable-gpu --screenshot https://www.mafengwo.cn
#  标准屏幕大小
chrome --headless --disable-gpu --screenshot --window-size=1280,1696 https://www.mafengwo.cn
#  Nexus 5x
chrome --headless --disable-gpu --screenshot --window-size=412,732 https://www.mafengwo.cn

给整个页面截屏参考：使用 headless Chrome 作为自动截屏工具
4、REPL 模式 (read-eval-print loop)

$ chrome --headless --disable-gpu --repl https://www.mafengwo.cn
[0822/194611.702393:INFO:headless_shell.cc(408)] Type a Javascript expression to evaluate or "quit" to exit.
>>> location.href
{"result":{"type":"string","value":"https://www.mafengwo.cn/"}}
>>> quit
[0822/194842.071093:ERROR:browser_process_sub_thread.cc(203)] Waited 14 ms for network service

三、在没有浏览器界面的时候调试Chrome

当使用 --remote-debugging-port=9222运行 Chrome 时，会启动一个支持 DevTools 协议的实例。该协议用于与 Chrome 进行通信，并且驱动 Headless Chrome 浏览器实例。它也是一个类似 Sublime、VS Code 和 Node 的工具，可用于应用程序的远程调试
执行命令后，浏览器输入：http://localhost:9222，会出现如下图的页面，之后就可以使用DevTools 来检查、调试和调整页面。
如果你以编程模式使用 Headless Chrome，这个页面也是一个功能强大的调试工具，用于查看所有通过网络与浏览器交互的原始 DevTools 协议命令。

image.png

四、使用编程模式（Node）

4.1、Puppeteer

Puppeteer 是一个由 Chrome 团队开发的 Node 库，用于自动化开发，只适用于最新版Chrome。
除此之外，Puppeteer 还可用于轻松截取屏幕截图，创建 PDF，页面间导航以及获取有关这些页面的信息。
可以快速地自动化进行浏览器测试。它隐藏了 DevTools 协议的复杂性，并可以处理诸如启动 Chrome 调试实例等繁冗的任务。
查看 Puppeteer 的文档，了解完整 API 的更多信息。

安装： npm i --save puppeteer

示例1：打印用户代理信息（创建js文件，node执行）

const puppeteer = require('puppeteer');
(async() => {
  const browser = await puppeteer.launch();
  console.log(await browser.version());
  browser.close();
})();

输出信息如下：HeadlessChrome/77.0.3844.0

示例2：获取页面的屏幕截图（A4纸打印，名字叫page.pdf）

const puppeteer = require('puppeteer');
(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.mafengwo.cn', {waitUntil: 'networkidle2'});
  await page.pdf({path: 'page.pdf', format: 'A4'});
  browser.close();
})();open

4.2、CRI库

chrome-remote-interface 是一个比 Puppeteer API 更低层次的库。如果想要更接近原始信息和更直接地使用 DevTools 协议的话，推荐使用它。
chrome-remote-interface 不会为你启动 Chrome，所以你要自己启动它。

在前面的 CLI 章节中，我们使用 --headless --remote-debugging-port=9222 手动启动了 Chrome。但是，要想做到完全自动化测试，你可能希望从你的应用程序中启动 Chrome。

4.2.1 从应用程序中启动Chrome

方法1：使用 child_process
弊端：如果你想要在多个平台上运行可移植的解决方案，事情会变得很棘手。请注意 Chrome 的硬编码路径：

const execFile = require('child_process').execFile;
function launchHeadlessChrome(url, callback) {
  // Assuming MacOSx.
  const CHROME = '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome';
  execFile(CHROME, ['--headless', '--disable-gpu', '--remote-debugging-port=9222', url], callback);
}
launchHeadlessChrome('https://www.chromestatus.com', (err, stdout, stderr) => {
  ...
});

方法2：使用 ChromeLauncher

Lighthouse 是一个令人称奇的网络应用的质量测试工具。Lighthouse 内部开发了一个强大的用于启动 Chrome 的模块，现在已经被提取出来单独使用。

优点：
1、chrome-launcher NPM 模块可以找到 Chrome 的安装位置，设置调试实例
2、启动浏览器和在程序运行完之后将其杀死
3、它最好的一点是可以跨平台工作

默认情况下，chrome-launcher 会尝试启动 Chrome Canary（如果已经安装），但是你也可以更改它，手动选择使用的 Chrome 版本。

安装chrome-launcher： npm i --save chrome-launcher

示例：使用使用 chrome-launcher 启动 Headless Chrome

const chromeLauncher = require('chrome-launcher'); 
// 可选: 设置launcher的日志记录级别以查看其输出 
// 安装：: npm i --save lighthouse-logger 
// const log = require('lighthouse-logger'); 
// log.setLevel('info');

/**
 * 启动Chrome的调试实例
 * @param {boolean=} headless True (default) 启动headless模式的Chrome.
 *     False 启动Chrome的完成版本.
 * @return {Promise<ChromeLauncher>} */ function launchChrome(headless=true) { return chromeLauncher.launch({ 
// port: 9222, // Uncomment to force a specific port of your choice.
 chromeFlags: [ '--window-size=412,732', '--disable-gpu',
      headless ? '--headless' : '' ]
  });
}

launchChrome().then(chrome => {
  console.log(`Chrome debuggable on port: ${chrome.port}`);
  ... // chrome.kill();
});

运行这个脚本没有做太多的事情，如下日志显示chrome已经通过53562端口启动，之后浏览器输入 localhost:53562,它加载了页面 about:blank。

因为是headless模式，没有浏览器界面。
要控制浏览器，我们需要DevTools协议！

sunshaokangdeMacBook-Pro:Google Chrome.app sunshaokang$ node chromLauncher.js
Chrome debuggable on port: 53562

image.png

4.3、使用chrome-remote-interface检索页面有关信息

DevTools 协议可以做很多事情，需要先花点时间浏览 DevTools 协议查看器。然后，转到 chrome-remote-interface 的 API 文档，看看它是如何包装原始协议的。

安装：npm i --save chrome-remote-interfacef
示例：打印用户代理

const chromeLauncher = require('chrome-launcher');
function launchChrome(headless=true) {
  return chromeLauncher.launch({
    // port: 9222, // Uncomment to force a specific port of your choice.
    chromeFlags: [
      '--window-size=412,732',
      '--disable-gpu',
      headless ? '--headless' : ''
    ]
  });
}

const CDP = require('chrome-remote-interface');
launchChrome().then(async chrome => {
  const version = await CDP.Version({port: chrome.port});
  console.log(version['User-Agent']);
});

打印结果如下

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/76.0.3809.100 Safari/537.36

示例：检查网站是否有 Web 应用程序清单：

const chromeLauncher = require('chrome-launcher');
function launchChrome(headless=true) {
  return chromeLauncher.launch({
    // port: 9222, // Uncomment to force a specific port of your choice.
    chromeFlags: [
      '--window-size=412,732',
      '--disable-gpu',
      headless ? '--headless' : ''
    ]
  });
}

const CDP = require('chrome-remote-interface');
...
(async function() {
const chrome = await launchChrome();
const protocol = await CDP({port: chrome.port});
// Extract the DevTools protocol domains we need and enable them.
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const {Page} = protocol;
await Page.enable();
Page.navigate({url: 'https://www.chromestatus.com/'});
// Wait for window.onload before doing stuff.
Page.loadEventFired(async () => {
  const manifest = await Page.getAppManifest();
  if (manifest.url) {
    console.log('Manifest: ' + manifest.url);
    console.log(manifest.data);
  } else {
    console.log('Site has no app manifest');
  }
  protocol.close();
  chrome.kill(); // Kill Chrome.
});
})();

打印结果如下

Manifest: https://www.chromestatus.com/static/manifest.json

{
  "name": "Chrome Platform Status",
  "short_name": "Chrome status",
  "start_url": "/features",
  "display": "standalone",
  "theme_color": "#366597",
  "background_color": "#366597",
  "icons": [{
    "src": "./img/crstatus_72.png",
    "sizes": "72x72",
    "type": "image/png"
  }, {
    "src": "./img/crstatus_96.png",
    "sizes": "96x96",
    "type": "image/png"
  }, {
    "src": "./img/crstatus_128.png",
    "sizes": "128x128",
    "type": "image/png"
  }, {
    "src": "./img/crstatus_144.png",
    "sizes": "144x144",
    "type": "image/png"
  }, {
    "src": "./img/crstatus_192.png",
    "sizes": "192x192",
    "type": "image/png"
  }, {
    "src": "./img/crstatus_512.png",
    "sizes": "512x512",
    "type": "image/png"
  }],
  "gcm_sender_id": "103953800507"
}

示例：使用DOM API获取页面的<title>：

const chromeLauncher = require('chrome-launcher');
function launchChrome(headless=true) {
  return chromeLauncher.launch({
    // port: 9222, // Uncomment to force a specific port of your choice.
    chromeFlags: [
      '--window-size=412,732',
      '--disable-gpu',
      headless ? '--headless' : ''
    ]
  });
}

const CDP = require('chrome-remote-interface');
...
(async function() {
const chrome = await launchChrome();
const protocol = await CDP({port: chrome.port});
// Extract the DevTools protocol domains we need and enable them.
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const {Page, Runtime} = protocol;
await Promise.all([Page.enable(), Runtime.enable()]);
Page.navigate({url: 'https://www.mafengwo.cn/'});
// Wait for window.onload before doing stuff.
Page.loadEventFired(async () => {
  const js = "document.querySelector('title').textContent";
  // Evaluate the JS expression in the page.
  const result = await Runtime.evaluate({expression: js});
  console.log('Title of page: ' + result.result.value);
  protocol.close();
  chrome.kill(); // Kill Chrome.
});
})();

打印：

Title of page: 旅游攻略,自由行,自助游攻略,旅游社交分享网站 - 马蜂窝

五、使用 Selenium、WebDriver 和 ChromeDriver

现在，Selenium 开启了 Chrome 的完整实例。换句话说，这是一个自动化的解决方案，但不是完全无需显示的。但是，Selenium 只需要进行小小的配置即可运行 Headless Chrome。如果你想要关于如何自己设置的完整说明，我建议你阅读“使用 Headless Chrome 来运行 Selenium”，不过你可以从下面的一些示例开始。

5.1、使用 ChromeDriver

ChromeDriver 2.32使用了Chrome61，并且在headless Chrome运行的更好。
安装：npm i --save-dev selenium-webdriver chromedriver
示例：

const fs = require('fs');
const webdriver = require('selenium-webdriver');
const chromedriver = require('chromedriver');
const PATH_TO_CANARY = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary';
const PORT = 9515
const chromeCapabilities = webdriver.Capabilities.chrome();
// Screenshots require Chrome 60\. Force Canary.
chromeCapabilities.set('chromeOptions', {
        binary: PATH_TO_CANARY,
        args: ['--headless']});

const driver = new webdriver.Builder()
  .forBrowser('chrome')
  .withCapabilities(chromeCapabilities)
  .build(); // Navigate to google.com, enter a search.
driver.get('https://www.google.com/');
driver.findElement({name: 'q'}).sendKeys('webdriver');
driver.findElement({name: 'btnG'}).click();
driver.wait(webdriver.until.titleIs('webdriver - Google Search'), 1000); // Take screenshot of results page. Save to disk.
driver.takeScreenshot().then(base64png => {
  fs.writeFileSync('screenshot.png', new Buffer(base64png, 'base64'));
});

driver.quit();

5.1使用 WebDriverIO

WebDriverIO 是一个在 Selenium WebDrive 上构建的更高层次的 API。
安装：npm i --save-dev webdriverio chromedriver
示例：过滤 chromestatus.com 上的 CSS 功能：

const webdriverio = require('webdriverio');
const chromedriver = require('chromedriver');
// This should be the path to your Canary installation.
// I'm assuming Mac for the example.
const PATH_TO_CANARY = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary';
const PORT = 9515;
chromedriver.start([
  '--url-base=wd/hub',
  `--port=${PORT}`,
  '--verbose'
]);
(async () => {
const opts = {
  port: PORT,
  desiredCapabilities: {
    browserName: 'chrome',
    chromeOptions: {
      binary: PATH_TO_CANARY // Screenshots require Chrome 60\. Force Canary.
      args: ['--headless']
    }
  }
};
const browser = webdriverio.remote(opts).init();
await browser.url('https://www.chromestatus.com/features');
const title = await browser.getTitle();
console.log(`Title: ${title}`);
await browser.waitForText('.num-features', 3000);
let numFeatures = await browser.getText('.num-features');
console.log(`Chrome has ${numFeatures} total features`);
await browser.setValue('input[type="search"]', 'CSS');
console.log('Filtering features...');
await browser.pause(1000);
numFeatures = await browser.getText('.num-features');
console.log(`Chrome has ${numFeatures} CSS features`);
const buffer = await browser.saveScreenshot('screenshot.png');
console.log('Saved screenshot...');
chromedriver.stop();
browser.end();
})();

六、更多资源

文档

DevTools Protocol Viewer - API 参考文档

工具

chrome-remote-interface - 基于 DevTools 协议的 node 模块
Lighthouse - 测试 Web 应用程序质量的自动化工具；大量使用了协议
chrome-launcher - 用于启动 Chrome 的 node 模块，可以自动化

样例

"The Headless Web" - Paul Kinlan 发布的使用了 Headless 和 api.ai 的精彩博客

七、常见问题

我需要 --disable-gpu 标志吗？

目前是需要的。--disable-gpu 标志在处理一些 bug 时是需要的。在未来版本的 Chrome 中就不需要了。查看 https://crbug.com/546953#c152 和 https://crbug.com/695212 获取更多信息。

所以我仍然需要 Xvfb 吗？

不。Headless Chrome 不使用窗口，所以不需要像 Xvfb 这样的显示服务器。没有它你也可以愉快地运行你的自动化测试。

什么是 Xvfb？Xvfb 是一个用于类 Unix 系统的运行于内存之内的显示服务器，可以让你运行图形应用程序（如 Chrome），而无需附加的物理显示器。许多人使用 Xvfb 运行早期版本的 Chrome 进行 “headless” 测试。

如何创建一个运行 Headless Chrome 的 Docker 容器？

查看 lighthouse-ci。它有一个使用 Ubuntu 作为基础镜像的 Dockerfile 示例，并且在 App Engine Flexible 容器中安装和运行了 Lighthouse。

我可以把它和 Selenium / WebDriver / ChromeDriver 一起使用吗？

是的。查看 Using Selenium, WebDrive, or ChromeDriver。

它和 PhantomJS 有什么关系？

Headless Chrome 和 PhantomJS 是类似的工具。它们都可以用来在无需显示的环境中进行自动化测试。两者的主要不同在于 Phantom 使用了一个较老版本的 WebKit 作为它的渲染引擎，而 Headless Chrome 使用了最新版本的 Blink。

目前，Phantom 提供了比 DevTools protocol 更高层次的 API。

Headless Chrome初探（一）