一些实用的脚本

此处保存一些我写过/用过/见过的以后可能会用到的脚本合集。

之前想在神经网络里实现Early Stopping，花了一段时间install pytorchtools因为可以import里面的EarlyStopping，但是好像新版本没有了，所以只能手动贴源码：

class EarlyStopping:
    """Early stops the training if validation loss doesn't improve after a given patience."""
    def __init__(self, patience=7, verbose=False, delta=0):
        """
        Args:
            patience (int): How long to wait after last time validation loss improved.
                            上次验证集损失值改善后等待几个epoch
                            Default: 7
            verbose (bool): If True, prints a message for each validation loss improvement.
                            如果是True，为每个验证集损失值改善打印一条信息
                            Default: False
            delta (float): Minimum change in the monitored quantity to qualify as an improvement.
                            监测数量的最小变化，以符合改进的要求
                            Default: 0
        """
        self.patience = patience
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.val_loss_min = np.Inf
        self.delta = delta

    def __call__(self, val_loss, model):

        score = -val_loss

        if self.best_score is None:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
        elif score < self.best_score + self.delta:
            self.counter += 1
            print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
            self.counter = 0

    def save_checkpoint(self, val_loss, model):
        '''
        Saves model when validation loss decrease.
        验证损失减少时保存模型。
        '''
        if self.verbose:
            print(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ...')
        # torch.save(model.state_dict(), 'checkpoint.pt')     # 这里会存储迄今最优模型的参数
        torch.save(model, 'finish_model.pkl')                 # 这里会存储迄今最优的模型
        self.val_loss_min = val_loss

early_stopping = EarlyStopping(patience=20, verbose=True)
for e in range(1, epochs+1):
    # train
    ...
    # validation
    ...
    early_stopping(val_avr, model)
    if early_stopping.early_stop:
        print('Early stopping')
        break
model.load_state_dict(torch.load('finish_model.pkl'))

在IRIS等网站下载了数据后，得到的是服务器上的文件列表链接，没有批量下载的途径。

所以先保存了网页为.html文件，其中包含每个文件的下载链接；接下来以"为分隔符提取出网址。

cat *.html | awk -F'"' '{print $8}' > list

将不要的几行删掉。

for i in $(ls list); do (wget -c --continue -t 0 ${i});done
或
wget -i list

或直接下载总网址上所有可下载的链接：

wget -r --level=0 -E --ignore-length -x -k -p -erobots=off -np -N http://www.remote.com/remote/presentation/dir

备注1：下载的地址默认在当前目录，所以下载大量文件时最好先提前新建一个文件夹，进入文件夹再下载。
备注2：以上三种方式我没有具体去评估比较，如果文件列表中有子文件夹，可能会有所区别。但第三种经检验可以全部下载下来。

上一点下载下来之后发现有些链接存储的不是文件，而是有文本的网页，这样用wget下载下来只会是html文件；我想要把文本存储为文件下下来。

for i in $(cat ../list); do (fn=`echo ${i} | awk -F'/' 'print $11}'`; curl ${i} >${fn}); done

curl是专门用来下载服务器文件的命令，链接无论存储的是文件还是文本网页都可以下载存为文本文件。
备注：list里存要下的链接，必须是下载文件的具体链接；如果是总文件夹链接，只会下载总网页文本信息。

辐射花样和震源球的计算和绘制
辐射花样：已知断层参数（strike, dip, rake）

注意：下半球面离源角为0-90°，最后写入dat中要转成纬度（离源角-90°）。dat文件为[方位角（0-360°），纬度（-90°-0°），辐射花样值]
再通过GMT中伍尔夫投影，即可得到震源球。

#!/bin/bash

R=0/360/-90/0
J=S0/-90/15c
B=a30f10
radpat_file=Radpat_P
gmt set MAP_FRAME_TYPE=plain
gmt set FORMAT_GEO_MAP=+D

gmt begin my_radpat_P png
gmt xyz2grd ${radpat_file}.dat -G${radpat_file}.nc -I1/1 -R$R
gmt grd2cpt ${radpat_file}.nc -Cpolar -E100 -H > ${radpat_file}.cpt
#gmt plot -R$R -J$J -T
gmt grdimage ${radpat_file}.nc -R$R -J$J -C${radpat_file}.cpt -B$B -BN+t"Radiation Pattern for P Wave"
gmt grdcontour ${radpat_file}.nc -R$R -J$J -L-0.001/0.001 -C1 -W2p
#gmt plot -R$R -J$J -T
gmt end show

rm gmt.conf gmt.history ${radpat_file}.cpt ${radpat_file}.nc

FFT（频谱）和PSD（功率谱密度）：
频谱：常用于能量信号（无穷长时间内能量有限）和周期信号，单位是被测物理量的单位。
功率谱：常用于随机信号（和周期信号同属功率信号，无穷长时间内功率有限，功率=能量/时间），单位是被测物理量单位^2/Hz，反映某一频率的能量密度。

https://zhuanlan.zhihu.com/p/417454806 （原理参考）
https://zhuanlan.zhihu.com/p/334501240 （脚本）

分布与边缘分布一并绘制：

https://zhuanlan.zhihu.com/p/75276939

样图

傅里叶变换与时频谱分析
Fourier Transform:

from scipy.fftpack import fft,ifft
#dt = tr.stats.npts/tr.stats.sampling_rate # 数据时间长度
df = tr.stats.sampling_rate
def FFT (Fs,data):
    data = data - np.mean(data)
    L = len(data)                          # 信号长度
    N = int(np.power(2,np.ceil(np.log2(L))))    # 下一个最近二次幂
    FFT_y = (abs(fft(data,N)))/L*2                  # N点FFT，但除以实际信号长度 L
    Fre = np.arange(int(N/2))*Fs/N          # 频率坐标
    FFT_y = FFT_y[range(int(N/2))]          # 取一半
    return Fre, FFT_y
#Fre, FFT_y1 = FFT(df, count)
Fre, FFT_y1 = FFT(df, tr.data)
plt.figure(figsize=(8,2),dpi=100)
plt.plot(Fre,FFT_y1)
#plt.plot(1/Fre/3600/24,FFT_y1)
plt.grid()
#plt.xlabel('Period (day)')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power')
plt.xscale('log')
#plt.xlim(0,0.00002)
plt.show()

Spectrogram:

from scipy.signal import stft
df = tr.stats.sampling_rate
f, t, zxx = stft(tr.data, fs=df, nperseg=512, noverlap=512*0.9, nfft=512)
plt.figure(figsize=(10,10))
plt.imshow(np.abs(zxx), aspect='auto',
           extent = (min(t), max(t), max(f), min(f))
          )
plt.gca().invert_yaxis()
#cb = plt.colorbar()
#cb.ax.tick_params(labelsize = 20)
plt.xticks(size = 20)
plt.yticks(size = 20)
plt.title('STFT Magnitude', size = 30)
plt.xlabel('Time [sec]', size = 30)
plt.ylabel('Frequency [Hz]', size = 30)

在python里使用shell命令：

import subprocess
s = "taup time -mod prem -h 12.67 -deg 1.131 -ph P,p -time"
proc = subprocess.Popen(s, stdout=subprocess.PIPE, shell=True)
res = proc.communicate()[0]
res.decode('gbk')

self-attention里加position embedding的方法：
（1）绝对位置

def position_matrix(start=0, stop=2, T=2, n=4):
    """
    Create a positional encoding matrix.
    Args:
    start: lowest value in the positional encoding vector
    stop: stop is the highest value in the positional encoding vector
    T: number of values in one cycle of the lowest bit in the PE vector
    n: length of the positional encoding vector
    Return: a matrix of shape (2^(n-1)*T, n)
    """

    step = (stop - start) / T
    one_cycle = np.arange(start, stop, step)
    pos_matrix = np.zeros((np.power(2, (n-1))*T, n))
    for j in range(n):
        block = np.repeat(one_cycle, np.power(2, j))
        pos_matrix[:,n-j-1] = np.tile(block, np.power(2, (n-1-j)))
    return pos_matrix

forward中：
ppmm = self.position_mat[:x.size(1)].repeat(x.size(0), 1, 1)
        ppmm = ppmm.cuda()
        ppmm = ppmm.to(torch.float32)
        # print(ppmm)
        #x = x + ppmm

主函数中：
num = 90; n = 12  # num———self-attention的seq_len; n———self-attention的input_dim
    T = max(2, int(num/2**(n-1))+1)
    pm = torch.from_numpy(position_matrix(0,2,T,n)) # (*,n)
    pm.to(device)
    model=Self_attention(*,*,pm)

（2）正弦余弦

class PositionalEncoding(nn.Module):
    """Positional encoding."""
    def __init__(self, num_hiddens, dropout, max_len=1000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(dropout)
        # Create a long enough `P`
        self.P = torch.zeros((1, max_len, num_hiddens))
        X = torch.arange(max_len, dtype=torch.float32).reshape(
            -1, 1) / torch.pow(10000, torch.arange(
            0, num_hiddens, 2, dtype=torch.float32) / num_hiddens)
        self.P[:, :, 0::2] = torch.sin(X)
        self.P[:, :, 1::2] = torch.cos(X)

    def forward(self, X):
        X = X + self.P[:, :X.shape[1], :].to(X.device)
        return self.dropout(X)

主函数中：
encoding_dim, num_steps = 32, 90
pos_encoding = PositionalEncoding(encoding_dim, 0)
pos_encoding.eval()
X = pos_encoding(torch.zeros((1, num_steps, encoding_dim)))
P = pos_encoding.P[:, :X.shape[1], :]

绘制子图时加colorbar不占据子图空间（即单独给colorbar加一个轴）

fig, ax = plt.subplots(2, 1, figsize=(3,3), dpi=300,
                       gridspec_kw={'height_ratios': [1, 5]},
                       sharex=True)
ax1 = ax[0]
ax2 = ax[1]
ax1.plot(np.arange(len(tr.data))/df, tr.data, 'k', lw=0.4)
ax1.set_xlim(0, len(tr.data)/df)
ax1.set_ylim(-0.08,0.08)

sc = ax2.pcolormesh(t1, f1[1:], amp_zxx[1:], cmap='OrRd', vmin=0, vmax=0.1)
#plt.pcolormesh(new_t, f_log, amp_zxx)
#plt.yscale('log')
#plt.gca().invert_yaxis()
cax = fig.add_axes([ax2.get_position().x1+0.02,ax2.get_position().y0,0.03,ax2.get_position().height])
plt.colorbar(sc, cax=cax)
#cax.tick_params(labelsize = 10)
plt.xticks(size = 10)
plt.yticks(size = 10)
#plt.title('STFT Magnitude', size = 30)
ax2.set_xlabel('Time [sec]', size = 10)
ax2.set_ylabel('Frequency [Hz]', size = 10)
plt.subplots_adjust(hspace=0)
plt.show()

结果示例：

plot 3d elevation for a region by GMT
Reference: GMT Chinese community

https://docs.gmt-china.org/latest/examples/ex029/#gmtplot-116c7b68ca8c491adade37c19a9a9b62

I plot the topography and add the locations of the array on it:

R=76.1167/76.6667/-69.6167/-69.35/-100/1000

gmt begin elevation_3d pdf
#gmt grdpaste REMA1.nc REMA2.nc -GREMA.nc
gmt grdtrack -R76.1167/76.6667/-69.6167/-69.35 -GREMA.nc staloc.txt > staloc.xyz # generate the corresponding elevations of the stations by track locations on topography file REMA.nc
gmt makecpt -Cgray -T0/900
gmt grdview REMA.nc -R$R -JM10c -JZ3c -N-100+ggray -C -Qi300 -I -Ba -BwsENZ -p30/25
cat staloc.xyz | awk '{print $1,$2,$5}' | gmt plot3d -R$R -JM10c -JZ3c -St0.05c -W1p,white -Gwhite -p
gmt colorbar -C -Ba -DJTC+o0/1c -p
gmt basemap -TdjRB+w3c+l+o0.5c/0.5c -p30/25/700 # plot the North arrow
gmt end show

rm staloc.xyz

pytorch: If I want to manually overwrite the parameters of several layers of a model:

old_model = torch.load('OLD.pt')
new_model = MODEL()
new = new_model.state_dict()
for param_tensor in new:
    new[param_tensor] = old_model.state_dict()[param_tensor]
    new_model.load_state_dict(new) # necessary! important!

一些实用的脚本

此处保存一些我写过/用过/见过的以后可能会用到的脚本合集。

推荐阅读更多精彩内容