HMM R包学习之 mHMMbayes

参考官方教程:https://cran.r-project.org/web/packages/mHMMbayes/vignettes/tutorial-mhmm.html

mHMMbayes包特点

1、Multiple individuals simultaneously

  • Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM.
  • 即可同时对多组的观测数据进行预测。不仅会预测每组的参数,而且也能得出总体的HMM参数

2、Multivariate data with a categorical distribution

  • 即允许有多类隐状态预测值,进行综合分析;
  • 例如天气hidden stata对应 天气潮湿情况、雨伞销量,骑车人数等多类观测结果。
  • 但是至少对于mHMMbayes包而言,这些观测结果必须为分类资料

3、Bayesian estimation to parameter

  • Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler
  • In the package mHMMbayes, authors chose to use Bayesian estimation because of its flexibility, which we will require in the multilevel framework of the model.

4、Viterbi algorithm

  • The package also includes a function to simulate data and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.

R包分析实战

0、了解示例数据

#install.packages("mHMMbayes")
library(mHMMbayes)
data(nonverbal)
head(nonverbal)
table(nonverbal[,1])

如下图,示例数据是15分钟内,10对医患非言语的交流情况观测。一秒钟记录一次,共900次观测,共有四类观测。

  • p_verbalizing: verbalizing behavior of the patient, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
  • p_looking: looking behavior of the patient, consisting of 1 = not looking at therapist, 2 = looking at therapist.
  • t_verbalizing: verbalizing behavior of the therapist, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
  • t_looking: looking behavior of the therapist, consisting of 1 = not looking at patient, 2 = looking at patient. The top 6 rows of the dataset are provided below.
data

2、初始值设置

  • 虽然设置的初始值在后续会使用Bayesian estimation调整优化;
  • 最好还是根据对数据分布的理解以及预期结果,设置合适的初始值;
  • 因为Using sensible starting values increases convergence speed, and often prevents a problem called ‘label switching’.

Label Switching:one should check if the algorithm reaches the same solution when a set of different (but often conceptually similar) starting values are used

#the number of states used
m <- 2
#the number of dependent variables in the dataset used to infer the hidden states
n_dep <- 4
#the number of categorical outcomes for each of the dependent variables
q_emiss <- c(3, 2, 3, 2)
#the transition probability matrix
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
#the emission distribution(s)
start_EM <- list(matrix(c(0.05, 0.90, 0.05, 
                          0.90, 0.05, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[1]), # vocalizing patient
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[2]), # looking patient
                 matrix(c(0.90, 0.05, 0.05, 
                          0.05, 0.90, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[3]), # vocalizing therapist
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[4])) # looking therapist
  • 如上start_EM为一个list,体现了Multivariate的特点

3、Fitting the model

# Run a model without covariate(s) and default priors:
set.seed(14532)
out_2st <- mHMM(s_data = nonverbal, 
                gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss), 
                start_val = c(list(start_TM), start_EM),
                mcmc = list(J = 1000, burn_in = 200))
#01:10:10 耗时比较久
  • The arguments needed for the MCMC algorithm are given in mcmc: J specifies the number of iterations used by the hybrid metropolis within Gibbs algorithm and burn_in specifies the number of iterations to discard when obtaining the model parameter summary statistics.
out_2st
out_2st
#总体参数
summary(out_2st)
  • 可以根据下图的结果,视图解释两个stata的意义


    summary(out_2st)
#想看看对每个subject的预测参数情况的话
obtain_gamma(out_2st, level = "subject")
obtain_gamma(out_2st, level = "subject")

4、Visualization

可分别对TM、EM矩阵进行可视化

  • EM
library(RColorBrewer)
Voc_col <- c(brewer.pal(3,"PuBuGn")[c(1,3,2)])
Voc_lab <- c("Not Speaking", "Speaking", "Back channeling")

plot(out_2st, component = "emiss", dep = 1, col = Voc_col, 
     parameter = "emiss", dep_lab = c("Patient vocalizing"), cat_lab = Voc_lab)

如下图,对第一类观测结果p_vocalization的结果可视化。虚线代表10个subject;实线代表总体值


TM
  • TM
# Transition probabilities at the group level and for subject number 1, respectively:
gamma_pop <- obtain_gamma(out_2st)
gamma_subj <- obtain_gamma(out_2st, level = "subject")
plot(gamma_pop, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
plot(gamma_subj, subj_nr = 1, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
TM group level

TM subject1

5、Viterbi algorithm decoding stata seq

  • 根据模型结果,并提供观测结果,利用维特比算法,推测每次观测最有可能的隐状态序列
state_seq <- vit_mHMM(out_2st, s_data = nonverbal)
head(state_seq)
head(nonverbal)
result
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 渐变的面目拼图要我怎么拼? 我是疲乏了还是投降了? 不是不允许自己坠落, 我没有滴水不进的保护膜。 就是害怕变得面...
    闷热当乘凉阅读 4,350评论 0 13
  • 感觉自己有点神经衰弱,总是觉得手机响了;屋外有人走过;每次妈妈不声不响的进房间突然跟我说话,我都会被吓得半死!一整...
    章鱼的拥抱阅读 2,219评论 4 5
  • 夜莺2517阅读 127,761评论 1 9
  • 版本:ios 1.2.1 亮点: 1.app角标可以实时更新天气温度或选择空气质量,建议处女座就不要选了,不然老想...
    我就是沉沉阅读 6,967评论 1 6