[PySAL] 空间自相关

原文

1. 介绍

空间自相关描述了空间上一系列点的属性分布模式。正相关反映了属性值在空间上的相似性，负相关则反映了差异性。这种自相关性显然不是随机产生的。

空间自相关有两种分析方式——全局自相关和局部自相关。顾名思义，全局自相关研究整个地图上属性分布模式的聚类情况。局部自相关则将关注点放在了整体的内部，去识别聚类或是热点，它可能和整体聚类模式相符，也可能反映了和整体模式相反的异质性。

2. 全局自相关

PySAL有5种全局自相关检验：Gamma值、Join Count、Moran’s I、Geary’s C、和Getis and Ord’s G。

2.1 Gamma值

Gamma

2.2 Join Count

2.3 Moran's I

对n个空间单位中，属性y的的I值计算公式如下：

Moran's I

其中w_ij是空间权重，z_i = y_i-\bar{y}，S_0=ΣΣw_ij。

下面以圣路易斯周围78个郡的蓄意谋杀率的案例研究为例：
先读取数据：

>>> import pysal
>>> import numpy as np
>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])

下面创建权重矩阵：

>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()

获取Moran's I的实例：

>>> mi = pysal.Moran(y, w, two_tailed=False)
>>> "%.3f"%mi.I
'0.244'
>>> mi.EI
-0.012987012987012988
>>> "%.5f"%mi.p_norm
'0.00014'

从结果可以看出，在假定谋杀率服从正态分布的情况下，观察值远高于期望值。

下面我们来看看mi对象的参数和属性：

class Moran(builtins.object)
 |  Moran's I Global Autocorrelation Statistic
 |  
 |  Parameters
 |  ----------
 |  
 |  y               : array
 |                    variable measured across n spatial units
 |  w               : W
 |                    spatial weights instance
 |  transformation  : string
 |                    weights transformation,  default is row-standardized "r".
 |                    Other options include "B": binary,  "D":
 |                    doubly-standardized,  "U": untransformed
 |                    (general weights), "V": variance-stabilizing.
 |  permutations    : int
 |                    number of random permutations for calculation of
 |                    pseudo-p_values
 |  two_tailed      : boolean
 |                    If True (default) analytical p-values for Moran are two
 |                    tailed, otherwise if False, they are one-tailed.
 |  
 |  Attributes
 |  ----------
 |  y            : array
 |                 original variable
 |  w            : W
 |                 original w object
 |  permutations : int
 |                 number of permutations
 |  I            : float
 |                 value of Moran's I
 |  EI           : float
 |                 expected value under normality assumption
 |  VI_norm      : float
 |                 variance of I under normality assumption
 |  seI_norm     : float
 |                 standard deviation of I under normality assumption
 |  z_norm       : float
 |                 z-value of I under normality assumption
 |  p_norm       : float
 |                 p-value of I under normality assumption
 |  VI_rand      : float
 |                 variance of I under randomization assumption
 |  seI_rand     : float
 |                 standard deviation of I under randomization assumption
 |  z_rand       : float
 |                 z-value of I under randomization assumption
 |  p_rand       : float
 |                 p-value of I under randomization assumption
 |  two_tailed   : boolean
 |                 If True p_norm and p_rand are two-tailed, otherwise they
 |                 are one-tailed.
 |  sim          : array
 |                 (if permutations>0)
 |                 vector of I values for permuted samples
 |  p_sim        : array
 |                 (if permutations>0)
 |                 p-value based on permutations (one-tailed)
 |                 null: spatial randomness
 |                 alternative: the observed I is extreme if
 |                 it is either extremely greater or extremely lower
 |                 than the values obtained based on permutations
 |  EI_sim       : float
 |                 (if permutations>0)
 |                 average value of I from permutations
 |  VI_sim       : float
 |                 (if permutations>0)
 |                 variance of I from permutations
 |  seI_sim      : float
 |                 (if permutations>0)
 |                 standard deviation of I under permutations.
 |  z_sim        : float
 |                 (if permutations>0)
 |                 standardized I based on permutations
 |  p_z_sim      : float
 |                 (if permutations>0)
 |                 p-value based on standard normal approximation from
 |                 permutations

哦，原来我们的推断不一定要基于正态假设，也可以基于随机置换的一组值。

>>> np.random.seed(10)
>>> mir = pysal.Moran(y, w, permutations = 9999)
>>> // 得到伪p值：
>>> print(mir.p_sim)
0.0022

关于置换检验
这里观察到21个置换值和实际的I值一样极端，因此p值就是(21+1)/(9999+1)=0.0022。或者，我们也可使用Z变换，将置换检验中得到的I值和原来的I值比较：

>>> print(mir.EI_sim)
-0.0118217511619
>>> print(mir.z_sim)
4.55451777821
>>> print(mir.p_z_sim)
2.62529422013e-06

如果我们的研究变量y是基于不同数量总体的比率，那么y的I值就需要调整，以解释总体间的差异。
为了进行这项调整，我们需要创建一个Moran_Rate的实例，而不是Moran的实例。比方说，我们打算估计死于婴儿瘁死综合症的新生儿比率。下面先读取数据，再创建W实例和Moran_Rate实例。

>>> f = pysal.open(pysal.examples.get_path("sids2.dbf"))
>>> b = np.array(f.by_col('BIR79'))
>>> e = np.array(f.by_col('SID79'))

>>> w = pysal.open(pysal.examples.get_path("sids2.gal")).read()

>>> mi = pysal.esda.moran.Moran_Rate(e, b, w, two_tailed=False)
>>> "%6.4f" % mi.I
'0.1662'
>>> "%6.4f" % mi.EI
'-0.0101'
>>> "%6.4f" % mi.p_norm
'0.0042'

显然，在调整后，I值远高于期望值。

2.4 Geary's C

2.5 Getis and Ord's G

3. 局部自相关

要定量地度量局部自相关性，PySAL提供了Moran's I和Getis and Ord's G的空间关联局部指标(LISA)。

3.1 Local Moran's I

先给出局部自相关I值的公式：

Local Moran's I

可以计算出n个局部空间自相关I值，每个空间单元1个。依然拿圣路易斯为例，LISA统计量可以这样获得：

>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])
>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()
>>> np.random.seed(12345)
>>> lm = pysal.Moran_Local(y,w)
>>> lm.n
78
>>> len(lm.Is)
78

78个LISA存储在向量lm.ls中。对这些值的推断由条件随机获得（除了i以外的n-1个空间单元被用来生成i的LISA统计量），从而得到每个LISA的伪p值。

>>> lm.p_sim
array([ 0.176,  0.073,  0.405,  0.267,  0.332,  0.057,  0.296,  0.242,
        0.055,  0.062,  0.273,  0.488,  0.44 ,  0.354,  0.415,  0.478,
        0.473,  0.374,  0.415,  0.21 ,  0.161,  0.025,  0.338,  0.375,
        0.285,  0.374,  0.208,  0.3  ,  0.373,  0.411,  0.478,  0.414,
        0.009,  0.429,  0.269,  0.015,  0.005,  0.002,  0.077,  0.001,
        0.088,  0.459,  0.435,  0.365,  0.231,  0.017,  0.033,  0.04 ,
        0.068,  0.101,  0.284,  0.309,  0.113,  0.457,  0.045,  0.269,
        0.118,  0.346,  0.328,  0.379,  0.342,  0.39 ,  0.376,  0.467,
        0.357,  0.241,  0.26 ,  0.401,  0.185,  0.172,  0.248,  0.4  ,
        0.482,  0.159,  0.373,  0.455,  0.083,  0.128])

numpy的索引可以帮助我们得到显著的p值：

>>> sig = lm.p_sim<0.05
>>> lm.p_sim[sig]
array([ 0.025,  0.009,  0.015,  0.005,  0.002,  0.001,  0.017,  0.033,
        0.04 ,  0.045])

也可以获得这些显著值在莫兰指数散点图中的象限：
（1：HH高聚类，2：LH高包围低，3：LL低值聚类，4：HL低包围高）

>>> lm.q[sig]
array([4, 1, 3, 1, 3, 1, 1, 3, 3, 3])

和全局莫兰指数一样，如果是比率，也要进行调整，同样以新生儿数据为例：

>>> f = pysal.open(pysal.examples.get_path("sids2.dbf"))
>>> b = np.array(f.by_col('BIR79'))
>>> e = np.array(f.by_col('SID79'))
>>> w = pysal.open(pysal.examples.get_path("sids2.gal")).read()

>>> np.random.seed(12345)
>>> lm = pysal.esda.moran.Moran_Local_Rate(e, b, w)
>>> lm.Is[:10]
array([-0.13452366, -1.21133985,  0.05019761,  0.06127125, -0.12627466,
        0.23497679,  0.26345855, -0.00951288, -0.01517879, -0.34513514])

获取显著的p值：

>>> sig = lm.p_sim<0.05
>>> lm.p_sim[sig]
array([ 0.021,  0.04 ,  0.047,  0.015,  0.001,  0.017,  0.032,  0.031,
        0.019,  0.014,  0.004,  0.048,  0.003])