前言
好久没更新了,老是想着更点什么,但又感觉没什么能拿得出手的,说下最近在做的事情吧,利用 NOAA 的长期遥感数据绘制了中国近岸1981-2023年部分海区的温度变化情况。
数据获取
这里我直接用的wget,一些其他的下载命令没有找到一次性下载大量数据的方法。凑活着用吧,20几个G也不是太大。
wget -r -np -nH -R index.html https://www.ncei.noaa.gov/thredds-ocean/catalog/ghrsst/L4/GLOB/REMSS/mw_ir_OI/catalog.html
数据处理
这次,我使用了python。开始尝了点苦头,建议还是conda新建一个专门用于处理nc数据的环境。
1. 包的导入
这里我用的包还不少,因为之前一直用R画图习惯了,这里我直接使用了plotnine,这个可以提供和ggplot类似的语法,简直是matplotlib和seaborn手生党的福音。
import xarray as xr
import numpy as np
import pandas as pd
from plotnine import *
import geopandas as gpd
from pyproj import CRS
from scipy.interpolate import griddata
import os
import ctd
from pathlib import Path
from wodpy import wod
from os.path import relpath
import netCDF4
from datetime import datetime, timedelta
import warnings
from concurrent.futures import ThreadPoolExecutor
import psutil
from scipy.stats import pearsonr
import patchworklib as pw
warnings.filterwarnings('ignore')
接下来,我打算写一个循环依次读取每一天的数据,然后爬出对应海区的数据。
2. 遍历文件夹和经纬度范围界限
sst_files = os.listdir("sst/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr")
chl_files = os.listdir("chla/pub/socd1/mecb/coastwatch/viirs/science/L3/global/chlora/dineof")
lat_min, lat_max = 22, 22.6 # 从上到下....
lon_min, lon_max = 114.5, 115.2
3. 地图底图绘制
我的目标区域比较小,所以需要自己绘制底图,这样比较清晰,如果区域比较大,空的地方直接用NA填充也可以出来海岸线。这个地图数据应该可以从一些国内的网站获取,最好是国内的,因为听说国外网站的一些地图容易 “缺斤少两”。
map_data = gpd.read_file("./2021-4/shi/CN-shi-A.shp")
daya_data = map_data[map_data["CityNameC"].isin(["惠州市", "深圳市"])]
daya_data_1 = daya_data.to_crs(CRS("EPSG:4326"))
4. 规定几个站位
为了方便后面数据的操作,这里我挑了几个站位,其实这套数据的分辨率不高,我的目标海区又比较小,也没有几个站位供我选择。
Stations = {
'Station': ["A", "B", "C", "D", "E", "F"],
'lat': [22.125, 22.125, 22.125, 22.375, 22.375,22.375],
'lon': [114.625, 114.875, 115.125, 114.625, 114.875, 115.125]
}
Stations = pd.DataFrame(Stations)
5. 绘制一个站位图吧
p1 = (ggplot() +
theme_bw() +
geom_map(data=daya_data_1, fill = "#8a7967") +
geom_point(Stations, aes(x = "lon", y = "lat"), size = 7, color = "#ff4f81") +
coord_fixed() +
geom_label(Stations, aes(x = "lon", y = "lat-0.05", label = "Station"), size = 10) +
xlim(lon_min, lon_max) +
ylim(lat_min, lat_max) +
labs(x = "Longitude", y = "Latitude") +
theme(axis_text = element_text(size = 10, family = "Arial"),
axis_title = element_text(size = 12, family = "Arial", face = "bold"),
strip_text = element_text(size = 10, family = "Arial", face = "bold"),
legend_text = element_text(size = 10, family = "Arial"),
legend_title = element_text(size = 12, family = "Arial", face = "bold"),
legend_position = "bottom"))
p1
6. 爬取目标海区数据
sst_files = [item for item in sst_files if "html" not in item]
i = 0
columns = ['lat', 'lon', 'sst', 'date']
sst_data = pd.DataFrame(columns=columns)
for dir in sst_files:
dir = "sst/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/" + dir
files = os.listdir(dir)
files = [item for item in files if "html" not in item]
for sst_file in files:
i = i + 1
file_path = dir + "/" + sst_file
dataset = xr.open_dataset(file_path, engine = "h5netcdf")
subset = dataset.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
df = subset.to_dataframe().reset_index()
data_part = df[["lat", "lon", "sst"]]
data_part = data_part.dropna()
date = sst_file.split(".")[1]
data_part["date"] = date
sst_data = pd.concat([sst_data, data_part], ignore_index= True)
if i % 365 == 0:
print(dir, sst_file, i)
7. 稍微做些数据处理
这里有些最新的数据文件名中有_preliminary,需要删掉以获取日期信息
sst_data["date"] = sst_data["date"].str.replace("_preliminary", "")
匹配站位
sst_data = pd.merge(sst_data, Stations, on = ["lat", "lon"])
查看下数据
sst_data.head()
画图
(ggplot(data[data["Year"] < 2024],
aes(x = "Year", y = "sst", color = "sst"))
+ theme_bw()
+ geom_smooth(method = "lm")
+ geom_vline(xintercept = 1994)
+ geom_point(size = 2, shape = ".")
+ facet_wrap("~Month", scales = "free_y")
+ theme(axis_text = element_text(size = 6),
strip_text = element_text(size = 8),
figure_size = (8, 5)))