浅谈分布式鲁棒随机优化之Wasserstein度量

在前文浅谈分布式鲁棒随机优化中介绍了基于度量构造的模糊集是现下分布式鲁棒优化较为热门的模糊集构造方式之一。其中根据Wasserstein距离来度量两个分布之间的距离进而构造出基于Wasserstein的模糊集是大家比较感兴趣的，接下来主要介绍Wasserstein距离。

Wasserstein距离来源于最优运输问题。最优运输问题是寻找概率测度间的最优传输变换的一类特殊的优化问题，它揭示了概率分布空间的内在规律，直观的解释是：假设有两个工地 $X$ 和 $Y$ ，工地 $X$ 上有 $m$ 堆土，工地 $Y$ 上有 $n$ 个坑，现在要将工地 $X$ 上的 $m$ 堆土全部移动到工地 $Y$ 上的 $n$ 个坑中，寻求使所做的功最小的运输方案。在最优运输方案下做的功就是工程领域中常遇到的推土机距离（Earth Mover distance or Wasserstein distance ). 接下来给出Wasserstein distance 定义：

定义（Wasserstein distance）设 $M(\Xi )$ 表示包含 all probability distributions $Q$ supported on $\Xi$ with $E^{Q}[\vert\vert\xi \vert\vert] = \int_{\Xi } \vert\vert \xi \vert\vert Q(d\xi ) <\infty$ 的概率空间。The Wasserstein metric $d_{W}:M(\Xi) \times M(\Xi)\rightarrow R_{+}$ is defined via:

$d_{W}(Q_{1},Q_{2}):=inf \{ \int_{\Xi \times \Xi} \vert\vert \xi_1 -\xi_{2} \vert \vert^{p} \Pi(d\xi_{1},d\xi_{2}) \}^\frac{1}{p}$

for all distributions $Q_{1},Q_{2} \in M(\Xi)$ ,where $\Pi$ is a joint distribution of $\xi_{1}$ and $\xi_{2}$ with marginals $Q_{1}$ and $Q_{2}$ ,respectively .

注：当 $p=1$ 时，Wasserstein度量也称为Kantorovich度量。

由定义可知，Wasserstein距离的 $p$ 次方是最优传输问题在代价函数为 $c(x,y)=\vert\vert x-y \vert\vert^{p}$ 时的最小传输代价，此外[2]中证明了Wasserstein 距离满足度量定义的三条性质（对称性、正定性、三角不等式）。

接下来给出 dual representation of the Wasserstein metric :

(dual representation of the Wasserstein metric) For any distributions $Q_{1},Q_{2} \in M(\Xi)$ we have

$d_{W}(Q_{1},Q_{2}) = sup_{f \in L } \{ \int_{\Xi} f(\xi)Q_{1}(d\xi) - \int_{\Xi} f(\xi)Q_{2}(d\xi) \}$ ,

where $L$ denotes the space of all Lipschitz functions. The dual representation implies that two distributions $Q_{1}$ and $Q_{2}$ are close to each other with respect to the Wasserstein metric if and only if all functions with uniformly bounded slopes have similar integrals under $Q_{1}$ and $Q_{2}$ .

如果给定 $N$ 个历史数据集 $\widehat{\Xi}_{N}:=\{\hat{\xi} \} _{i \leq N} \subseteq \Xi$ ，则其经验分布为 $\hat{P}_{N}:=\frac{1}{N}\sum_{i=1}^N \delta_{\hat{\xi_{i}}}$ ,有了历史数据的经验分布就可以利用Wasserstein距离来构造模糊集了，模糊集构造如下：

$B_{\varepsilon }(\hat{P}_{N}):=\{P:W_{p}(P,\hat{P}_{N})\leq \varepsilon \}$

从表达式可以看出，这个模糊集是以 $\hat{P}_{N}$ 为中心，以 $\varepsilon$ 为半径的一个概率分布空间的球体，随机变量 $\xi$ 的unknown true distribution 以较高的置信度包含在这个Wasserstein球内。

参考文献

[1] Esfahani P.M., Kuhn D. Data-driven Distributionally Robust Optimization Using the Wasserstein Metric: Performance Guarantees and Tractable Reformulations[J]. Mathematical Programming, 2018, 171(1/2):115-166.

[2]Santamorogio F. Optimal Transport for Applied Mathematic [M]. Birkauser, Cham, 2015.

[3]马丽涛, 边伟. 最优传输理论及其在图像处理中的应用[J]. 运筹学报.2019,23(3).

浅谈分布式鲁棒随机优化之Wasserstein度量

参考文献

推荐阅读更多精彩内容