R for data science
16. Dates and Times
16.2 Creating date/times
You should always use the simplest possible data type that works for your needs.
today() ## date of today
now() ## date-time of now
3 种需要使用date/time的途径;
- From a string.
- From individual date-time components.
- From an existing date/time object.
16.2.1 From strings
ymd("2017-01-31") ## 年月日
#> [1] "2017-01-31"
mdy("January 31st, 2017")## 月日年
#> [1] "2017-01-31"
dmy("31-Jan-2017") ## 日月年
#> [1] "2017-01-31"
ymd(20170131) ##也可以识别非字符串形式
ymd_hms("2017-01-31 20:11:59")
#> [1] "2017-01-31 20:11:59 UTC"
mdy_hm("01/31/2017 08:01")
#> [1] "2017-01-31 08:01:00 UTC"
16.2.2 From individual components
for dates, or make_datetime()
for date-times:
flights %>%
select(year, month, day, hour, minute) %>%
mutate(departure = make_datetime(year, month, day, hour, minute))
#> # A tibble: 336,776 x 6
#> year month day hour minute departure
#> <int> <int> <int> <dbl> <dbl> <dttm>
#> 1 2013 1 1 5 15 2013-01-01 05:15:00
#> 2 2013 1 1 5 29 2013-01-01 05:29:00
#> 3 2013 1 1 5 40 2013-01-01 05:40:00
#> 4 2013 1 1 5 45 2013-01-01 05:45:00
#> 5 2013 1 1 6 0 2013-01-01 06:00:00
#> 6 2013 1 1 5 58 2013-01-01 05:58:00
#> # … with 3.368e+05 more rows
in a numeric context (like in a histogram), 1 means 1 second, so a binwidth of 86400 means one day. For dates
, 1 means 1 day.
16.2.3 From other types
#转换date-time格式, -
# 转换为date格式。 - “Unix Epoch”时间,是从Epoch(1970年1月1日00:00:00 UTC)开始所经过的秒数,不考虑[闰秒]。在大多数的[UNIX]系统中UNIX时间戳存储为32位,这样会引发2038年问题或Y2038。If the offset is in seconds, use as_datetime(); if it’s in days, use as_date().
#> [1] "2019-01-08 UTC"
#> [1] "2019-01-08"
as_datetime(60 * 60 * 10)
#> [1] "1970-01-01 10:00:00 UTC"
as_date(365 * 10 + 2)
#> [1] "1980-01-01"
16.3 Date-time components
- year() , month(), mday() (day of the month), yday() (day of the year), wday() (day of the week), hour(), minute(), and second().
可以设置label = TRUE
参数,显示月份或者星期几的缩写。设置abbr = FALSE
16.3.2 Rounding 近似
- floor_date() #round down (floor)往前
- round_date() # 四舍五入
- ceiling_date()# round up (ceiling)靠后
16.3.3 Setting components
- 使用
对date/time 进行更改
对多个部分进行修改。update(datetime, year = 2020, month = 2, mday = 2, hour = 2)
- If values are too big, they will roll-over
16.4 Time spans
- durations, 持续时间.
- periods, 周期.
- intervals, 起止时间.
16.4.1 Durations
- 两个时间相减 ## 产生的格式为difftime
- as.duration()###转换为duration格式。始终以秒为单位计算持续时间,Larger units are created by converting minutes, hours, days, weeks, and years to seconds at the standard rate (60 seconds in a minute, 60 minutes in an hour, 24 hours in day, 7 days in a week, 365 days in a year).
- dseconds() ##通过函数创制Duration 格式的时间,形式为dxxxs()
dminutes() ##
dyears()
16.4.2 Periods 周期
days() # 1天
seconds(15) ## 15s
minutes(10) ## 10min
hours(12) ## 12 hour
months()## 月
weeks() ## 周
years()## 年
16.4.3 Intervals
today() %--% next_year
以"%--%" 隔开两个时间点,得到Interval 格式的数据。
- If you only care about physical time, use a duration;
- if you need to add human times, use a period;
- if you need to figure out how long a span is in human units, use an interval.
Figure 16.1 不同时间格式的计算操作规律.
16.5 Time zones
查看系统时区 - lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and roughly equivalent to its predecessor GMT (Greenwich Mean Time).调世界时是以原子时秒长为基础,在时刻上尽量接近于世界时的一种时间计量系统。
GMT(Greenwich Mean Time)——格林尼治标准时间
It does not have DST, which makes a convenient representation for computation. DTS is Daylight Saving Time的缩写,称为阳光节约时,在我国称为夏时制,又称夏令时,是一种为节约能源而人为调整地方时间的制度。
- Operations that combine date-times, like c(), will often drop the time zone.
16.3.4 Exercises
- How does the distribution of flight times within a day change over the course of the year?
flights_dt %>%
mutate(dep_time=update(dep_time,year = 2020, month = 2, mday = 2)) %>%
ggplot(aes(dep_time))+geom_freqpoly(binwidth = 3600)
- Compare dep_time, sched_dep_time and dep_delay. Are they consistent? Explain your findings.
flights_dt %>%
mutate(delay=(dep_time-sched_dep_time)) %>%
- Compare air_time with the duration between the departure and arrival. Explain your findings. (Hint: consider the location of the airport.)
flights %>% select(air_time,distance) %>%
- How does the average delay time change over the course of a day? Should you use dep_time or sched_dep_time? Why?
flights_dt %>% mutate(dep_time=update(dep_time,year=2013,month=1,mday=1))%>%
group_by(dep_time) %>%
summarise(mean=mean(dep_delay)) %>%
flights_dt %>% mutate(sched_dep_time=update(sched_dep_time,year=2013,month=1,mday=1))%>%
group_by(sched_dep_time) %>%
summarise(mean_delay=mean(dep_delay)) %>%
- On what day of the week should you leave if you want to minimise the chance of a delay?
flights_dt %>% mutate(weekday=wday(sched_dep_time)) %>%
group_by(weekday) %>%
summarise(mean=mean(dep_delay)) %>%
- What makes the distribution of diamonds$carat and flights$sched_dep_time similar?
by human judgement
sched_dep <- flights_dt %>%
mutate(minute = minute(sched_dep_time)) %>%
group_by(minute) %>%
avg_delay = mean(arr_delay, na.rm = TRUE),
n = n())
ggplot(sched_dep, aes(minute, n)) +
- Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early. Hint: create a binary variable that tells you whether or not a flight was delayed.
flights_dt %>% mutate(minute=minute(sched_dep_time),is=dep_delay<0) %>%
group_by(minute) %>%
summarise(ave_delay=mean(is),n=sum(is)/n()) %>%