Domain Generalization是Domain Adaptation的一种特殊问题，在DA的基础上，DG的目的为对于任意不可见的目标域都有很强的鲁棒性。

Domain Adaptation

Moment Matching for Multi-Source Domain Adaptation

多个源域，一个目标域。
code and data

方法分为三部分：

Feature Extractor共享权重，将不同源域的数据映射到同一个特征空间
Moment Matching Component试图将不同源域的特征分布拉到一起
Classifier融合多个源域的分类器加权输出

各源域 $\mathcal{D}_{S}$ 及目标域 $\mathcal{D}_{T}$ 之间的Momment距离定义为源域与目标域的距离及各源域之间的距离。公式如下：

$\begin{aligned} M D^{2}(\mathcal{D}_{S}, \mathcal{D}_{T})=\sum_{k=1}^{2}\left(\frac{1}{N} \sum_{i=1}^{N}||\mathbb{E}(\mathbf{X}_{i}^{k})-\mathbb{E}(\mathbf{X}_{T}^{k})||_2 +(\begin{array}{c}{N}\\{2}\end{array})^{-1}\sum_{i=1}^{N-1} \sum_{j=i+1}^{N}\left\|\mathbb{E}(\mathbf{X}_{i}^{k})-\mathbb{E}(\mathbf{X}_{j}^{k})\right\|_2\right) \end{aligned}$

目标函数表示为 $N$ 个源域的分类损失加上前述 $M D^{2}(\mathcal{D}_{S}, \mathcal{D}_{T})$
$\min _{G, \mathcal{C}} \sum_{i=1}^{N} \mathcal{L}_{\mathcal{D}_{i}}+\lambda \min _{G} M D^{2}\left(\mathcal{D}_{S}, \mathcal{D}_{T}\right)$

S-T的对齐比较有效。

Semi-Supervised Learning by Augmented Distribution Alignment

这是一篇ICCV2019的论文。
实际中，标注数据和非标注数据往往存在数量级的差距，标注数据的分布和非标注数据差距（可认为是真实分布）也很大。本文通过对抗学习拉齐标注数据和非标注数据的分布，此外参考mixup方法利用非标注数据尝试解决标注数据较少的问题。

如下，是本文的loss公式，公式分为两部分，第一部分表示分类任务，第二部分表示对抗学习。
$\min _{f, g, h} \sum_{\tilde{\mathbf{x}}} \ell(f(g(\tilde{\mathbf{x}})), \tilde{y})+\gamma \ell(h(g(\tilde{\mathbf{x}})), \tilde{z})$

具体地，对抗损失如下， $h$ 为样本属于标注样本 $\mathcal{D}_l$ 还是非标注样本 $\mathcal{D}_u$ 的分类函数。通过对抗学习，可以将标注样本和非标注样本的特征 $g$ 拉到同一空间。
$\min _{g} d_{\mathcal{H}}\left(\mathcal{D}_{l}, \mathcal{D}_{u}\right)=\max _{g} \min _{h \in \mathcal{H}}\left[\operatorname{err}\left(h, g, \mathcal{D}_{l}\right)+\operatorname{err}\left(h, g, \mathcal{D}_{u}\right)\right]$

此外，为了解决标注样本数量较少的问题，本文采用mixup方法利用非标注样本，公式如下，非标注样本的类别标签 $\hat{y}^u$ 由分类器生成。鉴别器的标签 $\tilde{z}$ 也进行mixup。
$\begin{aligned} \tilde{\mathbf{x}} &=\lambda \mathbf{x}^{l}+(1-\lambda) \mathbf{x}^{u} \\ \tilde{y} &=\lambda y^{l}+(1-\lambda) \hat{y}^{u} \\ \tilde{z} &=\lambda \cdot 0+(1-\lambda) \cdot 1 \end{aligned}$

下图是在SVHN数据集上的表现及跟STOA方法的对比。

Domain Generalization

Episodic Training for Domain Generalization

Vanilla Aggregation Method: 直接将各源域 $\mathcal{D}_i \sim \mathcal{D}$ 的数据送入特征提取器 $\theta$ 和分类器 $\psi$ 中进行训练。
$\underset{\theta, \psi}{\operatorname{argmin}} \mathbb{E}_{\mathcal{D}_{i} \sim \mathcal{D}}\left[\mathbb{E}_{\left(\mathbf{x}_{i}, y_{i}\right) \sim \mathcal{D}_{i}}\left[\ell\left(y_{i}, \psi\left(\theta\left(\mathbf{x}_{i}\right)\right)\right]\right]\right.$
Domain Specific Models: 每个domain的数据独立训练。
$\underset{\left[\theta_{1}, \ldots, \theta_{n}\right],\left[\psi_{1}, \ldots, \psi_{n}\right]}{\operatorname{argmin}} \mathbb{E}_{\mathcal{D}_{i} \sim \mathcal{D}}\left[\mathbb{E}_{\left(\mathbf{x}_{i}, y_{i}\right) \sim \mathcal{D}_{i}}\left[\ell\left(y_{i}, \psi_{i}\left(\theta_{i}\left(\mathbf{x}_{i}\right)\right)\right]\right]\right.$
Episodic Training of Feature Extractor: 不同domain使用相同的特征提取器，但用不同的分类器。强化特征提取器的训练。
$\underset{\theta}{\operatorname{argmin}} \mathbb{E}_{i, j \sim[1, n], i \neq j}\left[\mathbb{E}_{\left(\mathbf{x}_{i}, y_{i}\right) \sim \mathcal{D}_{i}}\left[\ell\left(y_{i}, \overline{\psi}_{j}\left(\theta\left(\mathbf{x}_{i}\right)\right)\right]\right]\right.$
Episodic Training of Classifier: 上同，不同的特征提取器，相同的分类器。强化分类器。
$\underset{\psi}{\operatorname{argmin}} \mathbb{E}_{i, j \sim[1, n], i \neq j}\left[\mathbb{E}_{\left(\mathbf{x}_{i}, y_{i}\right) \sim \mathcal{D}_{i}}\left[\ell\left(y_{i}, \psi\left(\overline{\theta}_{j}\left(\mathbf{x}_{i}\right)\right)\right]\right]\right.$
Episodic Training by Random Classifier: 更加极端的特征提取器的强化，完全使用一个随机初始化且不更新参数的分类器训练。

$\underset{\theta}{\operatorname{argmin}} \mathbb{E}_{\mathcal{D}_{i} \sim \mathcal{D}}\left[\mathbb{E}_{\left(\mathbf{x}_{i}, y_{i}\right) \sim \mathcal{D}_{i}}\left[\ell\left(y_{i}, \overline{\psi}_{r}\left(\theta\left(\mathbf{x}_{i}\right)\right)\right]\right]\right.$