参考官方教程：https://cran.r-project.org/web/packages/mHMMbayes/vignettes/tutorial-mhmm.html

mHMMbayes包特点

1、Multiple individuals simultaneously

Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM.
即可同时对多组的观测数据进行预测。不仅会预测每组的参数，而且也能得出总体的HMM参数

2、Multivariate data with a categorical distribution

即允许有多类隐状态预测值，进行综合分析；
例如天气hidden stata对应天气潮湿情况、雨伞销量，骑车人数等多类观测结果。
但是至少对于mHMMbayes包而言，这些观测结果必须为分类资料

3、Bayesian estimation to parameter

Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler
In the package mHMMbayes, authors chose to use Bayesian estimation because of its flexibility, which we will require in the multilevel framework of the model.

4、Viterbi algorithm

The package also includes a function to simulate data and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.

R包分析实战

0、了解示例数据

#install.packages("mHMMbayes")
library(mHMMbayes)
data(nonverbal)
head(nonverbal)
table(nonverbal[,1])

如下图，示例数据是15分钟内，10对医患非言语的交流情况观测。一秒钟记录一次，共900次观测，共有四类观测。

p_verbalizing: verbalizing behavior of the patient, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
p_looking: looking behavior of the patient, consisting of 1 = not looking at therapist, 2 = looking at therapist.
t_verbalizing: verbalizing behavior of the therapist, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
t_looking: looking behavior of the therapist, consisting of 1 = not looking at patient, 2 = looking at patient. The top 6 rows of the dataset are provided below.

data

2、初始值设置

虽然设置的初始值在后续会使用Bayesian estimation调整优化；
最好还是根据对数据分布的理解以及预期结果，设置合适的初始值；
因为Using sensible starting values increases convergence speed, and often prevents a problem called ‘label switching’.

Label Switching：one should check if the algorithm reaches the same solution when a set of different (but often conceptually similar) starting values are used

#the number of states used
m <- 2
#the number of dependent variables in the dataset used to infer the hidden states
n_dep <- 4
#the number of categorical outcomes for each of the dependent variables
q_emiss <- c(3, 2, 3, 2)
#the transition probability matrix
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
#the emission distribution(s)
start_EM <- list(matrix(c(0.05, 0.90, 0.05, 
                          0.90, 0.05, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[1]), # vocalizing patient
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[2]), # looking patient
                 matrix(c(0.90, 0.05, 0.05, 
                          0.05, 0.90, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[3]), # vocalizing therapist
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[4])) # looking therapist

如上start_EM为一个list，体现了Multivariate的特点

3、Fitting the model

# Run a model without covariate(s) and default priors:
set.seed(14532)
out_2st <- mHMM(s_data = nonverbal, 
                gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss), 
                start_val = c(list(start_TM), start_EM),
                mcmc = list(J = 1000, burn_in = 200))
#01:10:10 耗时比较久

The arguments needed for the MCMC algorithm are given in mcmc: J specifies the number of iterations used by the hybrid metropolis within Gibbs algorithm and burn_in specifies the number of iterations to discard when obtaining the model parameter summary statistics.

out_2st

out_2st

#总体参数
summary(out_2st)

可以根据下图的结果，视图解释两个stata的意义

summary(out_2st)

#想看看对每个subject的预测参数情况的话
obtain_gamma(out_2st, level = "subject")
obtain_gamma(out_2st, level = "subject")

4、Visualization

可分别对TM、EM矩阵进行可视化

library(RColorBrewer)
Voc_col <- c(brewer.pal(3,"PuBuGn")[c(1,3,2)])
Voc_lab <- c("Not Speaking", "Speaking", "Back channeling")

plot(out_2st, component = "emiss", dep = 1, col = Voc_col, 
     parameter = "emiss", dep_lab = c("Patient vocalizing"), cat_lab = Voc_lab)

如下图，对第一类观测结果p_vocalization的结果可视化。虚线代表10个subject；实线代表总体值

# Transition probabilities at the group level and for subject number 1, respectively:
gamma_pop <- obtain_gamma(out_2st)
gamma_subj <- obtain_gamma(out_2st, level = "subject")
plot(gamma_pop, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
plot(gamma_subj, subj_nr = 1, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))

TM group level

TM subject1

5、Viterbi algorithm decoding stata seq

根据模型结果，并提供观测结果，利用维特比算法，推测每次观测最有可能的隐状态序列

state_seq <- vit_mHMM(out_2st, s_data = nonverbal)
head(state_seq)
head(nonverbal)

result

HMM R包学习之 mHMMbayes

HMM R包学习之 mHMMbayes

mHMMbayes包特点

1、Multiple individuals simultaneously

2、Multivariate data with a categorical distribution

3、Bayesian estimation to parameter

4、Viterbi algorithm

R包分析实战

0、了解示例数据

2、初始值设置

3、Fitting the model

4、Visualization

5、Viterbi algorithm decoding stata seq

推荐阅读更多精彩内容