contents
时区处理
很多时间用户选择世界协调时间或者UTC
,它是格林治时间的后继者,目前的国家标准。时区通常表示为UTC
的偏置。
Python
语言中,时区信息通常是来自于第三库pytz
。pandas
中封装了pytz
的功能
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 获取时区名称
import pytz
pytz.common_timezones[-5:]
['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
# 获取pytz对象,使用pytz-timezone
tz = pytz.timezone('America/New_York')
tz
<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>
时区集合生成
rng = pd.date_range('5/10/2020 11:30'
,periods=6
,freq='D')
ts = pd.Series(np.random.randn(len(rng))
,index=rng)
ts
2020-05-10 11:30:00 1.072220
2020-05-11 11:30:00 2.088327
2020-05-12 11:30:00 -0.795575
2020-05-13 11:30:00 1.230427
2020-05-14 11:30:00 -0.012184
2020-05-15 11:30:00 -0.786641
Freq: D, dtype: float64
ts.index
DatetimeIndex(['2020-05-10 11:30:00', '2020-05-11 11:30:00',
'2020-05-12 11:30:00', '2020-05-13 11:30:00',
'2020-05-14 11:30:00', '2020-05-15 11:30:00'],
dtype='datetime64[ns]', freq='D')
print(ts.index.tz) # tz属性为None
None
# 时区集合生成
pd.date_range('5/10/2020',periods=10,freq='D',tz='UTC')
DatetimeIndex(['2020-05-10 00:00:00+00:00', '2020-05-11 00:00:00+00:00',
'2020-05-12 00:00:00+00:00', '2020-05-13 00:00:00+00:00',
'2020-05-14 00:00:00+00:00', '2020-05-15 00:00:00+00:00',
'2020-05-16 00:00:00+00:00', '2020-05-17 00:00:00+00:00',
'2020-05-18 00:00:00+00:00', '2020-05-19 00:00:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
简单时区转换到本地化:tz_localize
ts
2020-05-10 11:30:00 1.072220
2020-05-11 11:30:00 2.088327
2020-05-12 11:30:00 -0.795575
2020-05-13 11:30:00 1.230427
2020-05-14 11:30:00 -0.012184
2020-05-15 11:30:00 -0.786641
Freq: D, dtype: float64
ts_utc=ts.tz_localize('UTC')
ts_utc
2020-05-10 11:30:00+00:00 1.072220
2020-05-11 11:30:00+00:00 2.088327
2020-05-12 11:30:00+00:00 -0.795575
2020-05-13 11:30:00+00:00 1.230427
2020-05-14 11:30:00+00:00 -0.012184
2020-05-15 11:30:00+00:00 -0.786641
Freq: D, dtype: float64
ts_utc.index
DatetimeIndex(['2020-05-10 11:30:00+00:00', '2020-05-11 11:30:00+00:00',
'2020-05-12 11:30:00+00:00', '2020-05-13 11:30:00+00:00',
'2020-05-14 11:30:00+00:00', '2020-05-15 11:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
转换到其他时区:tz_convert
ts_utc.tz_convert("America/New_York") # 转到纽约时区
2020-05-10 07:30:00-04:00 1.072220
2020-05-11 07:30:00-04:00 2.088327
2020-05-12 07:30:00-04:00 -0.795575
2020-05-13 07:30:00-04:00 1.230427
2020-05-14 07:30:00-04:00 -0.012184
2020-05-15 07:30:00-04:00 -0.786641
Freq: D, dtype: float64
ts_utc.tz_convert("Asia/Shanghai") # 转到上海时区
2020-05-10 19:30:00+08:00 1.072220
2020-05-11 19:30:00+08:00 2.088327
2020-05-12 19:30:00+08:00 -0.795575
2020-05-13 19:30:00+08:00 1.230427
2020-05-14 19:30:00+08:00 -0.012184
2020-05-15 19:30:00+08:00 -0.786641
Freq: D, dtype: float64
实例化方法
tz_localzie、tz_convert是DatetimeIndex的实例化方法
ts.index.tz_localize('Asia/Shanghai')
DatetimeIndex(['2020-05-10 11:30:00+08:00', '2020-05-11 11:30:00+08:00',
'2020-05-12 11:30:00+08:00', '2020-05-13 11:30:00+08:00',
'2020-05-14 11:30:00+08:00', '2020-05-15 11:30:00+08:00'],
dtype='datetime64[ns, Asia/Shanghai]', freq='D')
时区感知时间戳对象的操作
单独的Timestamp对象也可以从简单时间戳本地为时区感知时间戳
Timestamp对象的转化
stamp = pd.Timestamp('2020-05-10 23:49')
stamp
Timestamp('2020-05-10 23:49:00')
stamp_utc = stamp.tz_localize('utc') # 本地化
stamp_utc
Timestamp('2020-05-10 23:49:00+0000', tz='UTC')
stamp_utc.tz_convert("Asia/Shanghai") # 时区转化
Timestamp('2020-05-11 07:49:00+0800', tz='Asia/Shanghai')
创建的时候直接传递时区
stamp_shanghai = pd.Timestamp("2020-05-10 23:58"
,tz="Asia/Shanghai") # 直接传递时区
stamp_shanghai
Timestamp('2020-05-10 23:58:00+0800', tz='Asia/Shanghai')
时间戳数值不变性
时区感知的Timestamp对象内部存储的一个UNix到现在的时间戳数值,保持不变
stamp_shanghai.value
1589126280000000000
# 结果同上
stamp_shanghai.tz_convert("America/New_York").value
1589126280000000000
dateOffset
from pandas.tseries.offsets import Hour
data = pd.Timestamp("2020-05-10 01:30" # 创建一个Timestamp对象
,tz="Asia/Shanghai")
data
Timestamp('2020-05-10 01:30:00+0800', tz='Asia/Shanghai')
data + Hour(2) # 加上2个小时
# data +2 * Hour()
Timestamp('2020-05-10 03:30:00+0800', tz='Asia/Shanghai')
不同时区的操作
如果两个不同时区的时间序列需要联合,结果将是UTC时间的。时间戳按照UTC格式存储
rng = pd.date_range("2020-05-10 23:43"
,periods=10
,freq="B")
rng
DatetimeIndex(['2020-05-11 23:43:00', '2020-05-12 23:43:00',
'2020-05-13 23:43:00', '2020-05-14 23:43:00',
'2020-05-15 23:43:00', '2020-05-18 23:43:00',
'2020-05-19 23:43:00', '2020-05-20 23:43:00',
'2020-05-21 23:43:00', '2020-05-22 23:43:00'],
dtype='datetime64[ns]', freq='B')
ts = pd.Series(np.random.randn(len(rng)) # 随机取值
,index=rng) # 行index
ts
2020-05-11 23:43:00 0.258933
2020-05-12 23:43:00 0.416673
2020-05-13 23:43:00 0.089695
2020-05-14 23:43:00 -0.347376
2020-05-15 23:43:00 -0.304925
2020-05-18 23:43:00 -0.891367
2020-05-19 23:43:00 -0.960866
2020-05-20 23:43:00 -0.420829
2020-05-21 23:43:00 0.591673
2020-05-22 23:43:00 1.431417
Freq: B, dtype: float64
ts1 = ts[:7].tz_localize('Asia/Shanghai')
ts2 = ts1[2:].tz_convert('Europe/Moscow')
ts1
2020-05-11 23:43:00+08:00 0.258933
2020-05-12 23:43:00+08:00 0.416673
2020-05-13 23:43:00+08:00 0.089695
2020-05-14 23:43:00+08:00 -0.347376
2020-05-15 23:43:00+08:00 -0.304925
2020-05-18 23:43:00+08:00 -0.891367
2020-05-19 23:43:00+08:00 -0.960866
Freq: B, dtype: float64
ts2
2020-05-13 18:43:00+03:00 0.089695
2020-05-14 18:43:00+03:00 -0.347376
2020-05-15 18:43:00+03:00 -0.304925
2020-05-18 18:43:00+03:00 -0.891367
2020-05-19 18:43:00+03:00 -0.960866
Freq: B, dtype: float64
result = ts1 + ts2
result
2020-05-11 15:43:00+00:00 NaN
2020-05-12 15:43:00+00:00 NaN
2020-05-13 15:43:00+00:00 0.179391
2020-05-14 15:43:00+00:00 -0.694751
2020-05-15 15:43:00+00:00 -0.609850
2020-05-18 15:43:00+00:00 -1.782735
2020-05-19 15:43:00+00:00 -1.921732
Freq: B, dtype: float64
result.index
DatetimeIndex(['2020-05-11 15:43:00+00:00', '2020-05-12 15:43:00+00:00',
'2020-05-13 15:43:00+00:00', '2020-05-14 15:43:00+00:00',
'2020-05-15 15:43:00+00:00', '2020-05-18 15:43:00+00:00',
'2020-05-19 15:43:00+00:00'],
dtype='datetime64[ns, UTC]', freq='B')