痛觉残留——Node原理

模块原理

在开始 Node 原理剖析之前，想讲一个十分简单的题目：module.exports 和 exports 有什么区别？这道题也算是面试必问题目之一了，答不出来面试基本就凉凉了。这里为尚未了解到的猿同胞们再解释一遍。

1.exports 是 module.exports 的一个引用，即 exports = module.exports = {}，二者所存的地址变量指向同一个对象。
2.暴露其实只会暴露出 module.exports 指向的对象，所以如果 exports 所存的地址指向另一个对象，则无法暴露，这也是为什么不能直接给 exports 赋值（exports = xxx）。

再深问一点：为什么暴露的是 module.exports？可能已经有猿同胞了解过。

1.Node 执行每个 js 文件时，其实是把内容包裹在一个函数中，然后执行这个函数。
2.这个函数传入的参数共有 exports、require、module、__filename、__dirname，所以我们可以在 js 文件中直接使用这些参数。而其它模块导入时会导入 module 的 exports 属性值，而不是 exports 引用。

那继续深挖：Node 是如何给文件包裹函数的？如何执行这个函数？exports / module.exports 和 require 是如何实现的？本篇就将解答这三个问题。

论据：Node 自带许多模块，且目前仍遵循 CommonJS 规范：

1.在 CommonJS 规范中，一个文件就是一个模块。
2.在 CommonJS 规范中，exports 暴露模块数据，require 导入模块数据。
3.Node 模块中，fs 文件模块可以读取文件为二进制或字符串。（显然二进制或字符串都无法执行）
4.Node 模块中，vm 模块具有安全虚拟机环境可以将字符串转化为代码执行。

结论：require 导入模块数据 + 文件 = 模块 => require 读取文件。因此，在 require 函数中，本质是根据传入的路径参数，用 fs 模块读取相应的文件为字符串，在字符串前后拼接被转化成字符串的函数，用 vm 虚拟机执行函数。

在使用 vm 虚拟机之前，补充一个知识点，runInThisContext 和 runInNewContext 的区别。

引入 vm 模块，通过 vm 的 runInThisContext 方法执行字符串，该字符串必须可以转化为代码执行。

image

可见是可以执行的，那么可不可以植入变量，尤其是虚拟机所处文件的变量。

image

报错 str 未定义，可见虚拟机外部变量是无法访问的，但我们知道，node 是存在全局变量的，虚拟机和虚拟机所在文件共处一个 node 环境，那么全局变量能否植入。

image

又正常输出了，那如何将虚拟机和虚拟机所在文件所处环境分开，那就需要使用 runInNewContext。

image

再次报错，可见 runInNewContext 无法访问 global 中的变量。

回到正题，继续看 Node 如何实现 require，首先肯定需要定义一个 require 函数，忽略内部的 try finally，require 函数体仅剩一句，self 就代表执行 require 的模块，相当于执行每个模块自身的 require 方法。

// 原版
function require(path) {
  try {
    exports.requireDepth += 1;
    return self.require(path);
  } finally {
    exports.requireDepth -= 1;
  }
}
// 忽略try finally
function require(path) {
    return self.require(path);
}

而模块自身的 require 方法所在文件引入了 assert 断言库，断言库内容我在《遗落之城——Unit Test》中有写过。这里断言 path 必须存在且必须是 string 类型，成功后调用模块的_load 静态方法，而_load 本质其实是加载已被加载过的模块，并不是真正加载模块的函数。

Module.prototype.require = function (path) {
  assert(path, "missing path");
  assert(typeof path === "string", "path must be a string");
  return Module._load(path, this, /* isMain */ false);
};

先贴上_load 方法的完整代码，再一句句分析。

1.Module._resolveFilename 是将 request 相对路径情况转为绝对路径，如果 request 是绝对路径就原样返回，还有一种情况是 node 原生模块，也会原样返回，同时 filename 也会作为该模块的唯一标识。
2.先从 Module 的静态属性_cache（缓存）上查找，如果已有缓存，则直接返回缓存的 exports。
3.通过 NativeModule.nonInternalExists 判断是否是原生模块，如果是则调用原生模块 NativeModule 的 require 方法返回该模块的 exports。
4.两种情况都不满足，就说明是尚未加载的自定义模块，会新建一个 Module 对象，将 Module 对象以 filename 为 key 放入缓存中，之后其它模块引入相同模块时就可以直接从缓存中取了，最后通过 tryModuleLoad 加载该模块，并返回该模块的 module.exports。

Module._load = function (request, parent, isMain) {
    if (parent) {
        debug('Module._load REQUEST %s parent: %S', request, parent.id);
    }

    var filename = Module._resolveFilename(request, parent, isMain);

    var cachedModule = Module._cache[filename];
    if (cachedModule) {
        return cachedModule.exports;
    }

    if (NativeModule.nonInternalExists(filename)) {
        debug('load native module %s', request);
        return NativeModule.require(filename);
    }

    var module = new Module(filename, parent);

    if (isMain) {
        process.mainModule = module;
        module.id = '.';
    }

    Module._cache[filename] = module;

    tryModuleLoad(module, filename);

    return module.exports;
}

Module 对象要注意的点并不多，将 id 初始化为绝对路径，将 exports 初始化为空对象，还添加了一个和 loaded 属性，初值为 false，这个属性值会在 tryModuleLoad 成功调用完成后变为 true，表示该模块加载完成。

function Module(id, parent) {
    this.id = id;
    this.exports = {};
    this.parent = parent;
    if (parent && parent.children) {
        parent.children.push(this);
    }

    this.filename = null;
    this.loaded = false;
    this.children = []
}

tryModuleLoad 也只是尝试加载，如果加载失败，就会从 Module 的缓存中删除该模块，而到这里，真正的加载函数 load 方法才冒出水面。

function tryModuleLoad(module, filename) {
    var threw = true;
    try {
        module.load(filename);
        threw = false;
    } finally {
        if (threw) {
            delete Module._cache[filename]
        }
    }
}

同样先贴上 load 方法的完整代码再一句句分析。

1.先断言该模块未被加载，然后将绝对路径赋值给该模块的 filename 属性（filename 相当于和 id 同值）。
2.最核心的代码其实就是通过 path.extname 获取扩展名，无扩展名或 Module._extensions 上午该扩展名对应的方法时，都会赋值扩展名为.js。
3.Module 的_extensions 静态属性是 object 对象，key 为扩展名，值为加载对应扩展名模块的方法函数。最后相当于通过扩展名拿到对应函数执行，执行完成后，将该模块加载完成状态 loaded 改为 true。

Module.prototype.load = function (filename) {
    debug('load %j for module %j', filename, this.id);

    assert(!this.loaded);
    this.filename = filename;
    this.paths = Module._nodeModulePaths(path.dirname(filename));

    var extension = path.extname(filename) || '.js';
    if (!Module._extensions[extension]) extension = '.js';
    Module._extensions[extension](this, filename);
    this.loaded = true;
}

_extensions 共有三种扩展名及对应的方法：.json、.js、.node。主要分析前两种，因为.node 一般为 C/C++文件，这里就不涉及相关知识了。另一方面，.json 和.js 都调用了 internalModule.stripBOM 方法，作用是剥离 utf8 编码特有的 BOM 文件头，也没必要细究。

首先是.json 文件，加载流程非常简单，直接通过 fs 模块读取文件为字符串，然后通过 JSON.parse 转为 json 对象赋值给 module.exports 函数。

Module._extensions['.json'] = function (module, filename) {
    var content = fs.readFileSync(filename, 'utf8');
    try {
        module.exports = JSON.parse(internalModule.stripBOM(content))
    } catch (err) {
        err.message = filename + ':' + err.message;
        throw err;
    }
}

其次是.js，也是先用 fs 模块读取为字符串，但注意_compile 编译完并未做赋值。

Module._extensions['.js'] = function (module, filename) {
    var content = fs.readFileSync(filename, 'utf8');
    module._compile(internalModule.stripBOM(content), filename);
}

_compile 是对读取完的 js 文件进行编译，表面上函数很复杂，但需要关注的地方并不多，先看 var wrapper = Module.wrap(content)，fs 读取完的字符串作为参数传给 wrap 函数。

Module.prototype._compile = function (content, filename) {
    var contLen = content.length;
    if (contLen >= 2) {
        if (content.charCodeAt(0) === 35 /*#*/ &&
            content.charCodeAt(1) === 33 /*!*/ ) {
            if (contLen === 2) {
                content = '';
            } else {
                var i = 2;
                for (; i < contLen; ++i) {
                    var code = content.charCodeAt(i);
                    if (code === 10 /*\n*/ || code === 13 /*\r*/ )
                        break;
                }
                if (i === contLen)
                    content = '';
                else {
                    content = content.slice(i);
                }
            }
        }
    }

    var wrapper = Module.wrap(content);

    var compiledWrapper = vm.runInThisContext(wrapper, {
        filename: filename,
        lineOffset: 0,
        displayErrors: true
    });

    var inspectorWrapper = null;
    if (process._debugWaitConnect && process._eval == null) {
        if (!resolvedArgv) {
            if (process.argv[1]) {
                resolvedArgv = Module._resolveFilename(process.argv[1], null, false);
            } else {
                resolvedArgv = 'repl';
            }
        }

        if (filename === resolvedArgv) {
            delete process._debugWaitConnect;
            inspectorWrapper = getInspectorCallWrapper();
            if (!inspectorWrapper) {
                const Debug = vm.runInDebugContext('Debug');
                Debug.setBreakPoint(compiledWrapper, 0, 0);
            }
        }
    }

    var dirname = path.dirname(filename);

    var require = internalModule.makeRequireFunction(this);

    var depth = internalModule.requireDepth;
    if (depth === 0) stat.cache = new Map();
    var result;

    if (inspectorWrapper) {
        result = inspectorWrapper(compiledWrapper, this.exports, this.exports, require, this, filename, dirname);
    } else {
        result = compiledWrapper.call(this.exports, this.exports, require, this, filename, dirname);
    }
    if (depth === 0) stat.cache = null;
    return result;
};

wrap 函数的 script 参数接收字符串后，返回一个被转化为字符串的函数所包裹的字符串，这样读取到的文件内容就相当于位于函数体中了。回到_compile 函数，紧接着便调用了 vm.runInThisContext 方法，但需要注意的是，执行结果相当于定义了一个函数，并将函数存在 compiledWrapper 变量中，而并不是调用这个函数。

NativeModule.wrap = function (script) {
    return NativeModule.wrapper[0] + script + NativeModule.wrapper[1];
};
NativeModule.wrapper = [
    '(function (exports, require, module, __filename, __dirname) { ',
    '\n});'
];
Module.wrapper = NativeModule.wrapper;
Module.wrap = NativeModule.wrap;

忽略掉中间冗余无需关注的部分，最后需要关注的只剩 result = compiledWrapper.call(this.exports, this.exports, require, this, filename, dirname)（自定义模块都会执行 else 情况）。

1.通过 call 方法执行 compiledWrapper 所存的函数。
2.将函数的 this 指向_load 函数中 new Module 创建对象的 exports，这也是为什么我们在文件中调用 this 会输出暴露的数据。
3.将 exports 属性、require 方法、module 对象自身、module 对象的 filename（同时也是 id）作为文件名、module 对象所在的文件夹路径 dirname，共同作为函数参数传入。
4.其中 filename 是函数最开始传入的参数，dirname 通过 load 函数的 var dirname = path.dirname(filename)语句获取。
5.至于 require，可以看到执行了一句 var require = internalModule.makeRequireFunction(this)，而这句函数就是给模块创建一个 require 方法，也就是开始的 require 函数。
6.注意到，执行完的结果 result 变量虽然 return 了但在_extensions['.js']中并没有做赋值操作，反而是通过 call 函数，将暴露的数据和 new Module 创建对象的 exports 绑定。

最后，再提一遍，在_load 函数中有 return module.exports，这样 module.exports 所存的地址层层返回给 require 结果，就是返回到调用 require 的模块中，这样调用 require 的模块也就拿到了被导入模块所暴露的数据。