--- title: "6 Continuous-time HMMs" author: "Jan-Ole Fischer" output: rmarkdown::html_vignette # output: pdf_document vignette: > %\VignetteIndexEntry{Continuous-time_HMMs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: refs.bib link-citations: yes --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # fig.path = "img/", fig.align = "center", fig.dim = c(8, 6), out.width = "85%" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` > Before diving into this vignette, we recommend reading the vignettes [**Introduction to LaMa**](https://janolefi.github.io/LaMa/articles/Intro_to_LaMa.html) and [**Automatic differentiation via RTMB**](https://janolefi.github.io/LaMa/articles/LaMa_and_RTMB.html). ```{r setup} library(LaMa) ``` In standard HMMs, observations are assumed to arrive at **regular, equidistant time points**, so that the transition probability matrix $\Gamma$ can be given a consistent interpretation with respect to a fixed time unit (e.g. one hour or one day). In many real-world applications this assumption breaks down: animal telemetry fixes arrive at varying intervals, patient measurements are taken at unequal times, or data simply have gaps. Using a regular HMM in such cases leads to a misspecified transition structure, because the same $\Gamma$ is applied regardless of whether the time gap is short or long. The natural remedy is a **continuous-time HMM**, which replaces the discrete-time Markov chain with a **continuous-time Markov chain (CTMC)**. The observation model and likelihood structure remain the same; only the state process changes. One important prerequisite is the **snapshot property**: the observed value at time $t$ may only depend on the state *at that instant*, not on the history of the process or on how much time has elapsed since the previous observation. This is a reasonable assumption for many biological and physical processes. For more details see @glennie2023hidden. ## The generator matrix A CTMC is fully characterised by its **(infinitesimal) generator matrix** $$ Q = \begin{pmatrix} q_{11} & q_{12} & \cdots & q_{1N} \\ q_{21} & q_{22} & \cdots & q_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ q_{N1} & q_{N2} & \cdots & q_{NN} \\ \end{pmatrix}, $$ where $q_{ij} \geq 0$ for $i \neq j$ and the diagonal satisfies $q_{ii} = -\sum_{j \neq i} q_{ij}$, so that every row sums to zero. The off-diagonal entry $q_{ij}$ is the **instantaneous rate** at which the process transitions from state $i$ to state $j$. The diagonal entry $-q_{ii} = \sum_{j \neq i} q_{ij}$ is the total leaving rate from state $i$, and the **dwell time** (time spent in state $i$ before leaving) is exponentially distributed with mean $1/(-q_{ii})$. Conditional on leaving state $i$, the probability of moving to state $j \neq i$ is $$ \omega_{ij} = \frac{q_{ij}}{-q_{ii}}. $$ These two properties — the dwell time distribution and the conditional next-state probabilities — together fully describe the dynamics of the chain, and they are also how we simulate it in the examples below. ## From the generator to transition probabilities Given $Q$ and a time interval of length $\Delta t$, the transition probability matrix is obtained via the **matrix exponential**: $$ \Gamma(\Delta t) = \exp(Q \, \Delta t). $$ This follows from the Kolmogorov forward equations; see @dobrow2016introduction for details. The key practical point is that $\Gamma(\Delta t)$ automatically adapts to the length of the interval: short gaps produce a matrix close to the identity (little chance of switching), and long gaps produce a matrix close to the stationary distribution (the process has had time to mix). `LaMa` computes this efficiently for an entire vector of time differences using `tpm_ct()`. ## Likelihood The likelihood of a continuous-time HMM for observations $x_{t_1}, \dots, x_{t_T}$ at irregular times $t_1 < \dots < t_T$ has **exactly the same structure** as the regular HMM likelihood: $$ L(\theta) = \delta^{(1)} \,\Gamma(\Delta t_1)\, P(x_{t_2})\, \Gamma(\Delta t_2)\, P(x_{t_3}) \cdots \Gamma(\Delta t_{T-1})\, P(x_{t_T})\, \mathbf{1}^\top, $$ where $\Delta t_k = t_{k+1} - t_k$ and each $\Gamma(\Delta t_k) = \exp(Q \,\Delta t_k)$ is specific to its interval. The initial distribution $\delta^{(1)}$ is typically set to the stationary distribution of the CTMC, available via `stationary_ct()`. Everything else — the allprobs matrix and the forward algorithm via `forward()` — is identical to the regular HMM case. ## Example 1: two states ### Simulating data We start by setting parameters. The generator below specifies that from state 1 the leaving rate is 0.5 (mean dwell time $1/0.5 = 2$ time units), and from state 2 the leaving rate is 1 (mean dwell time $1/1 = 1$ time unit), so state 1 is the more persistent state. ```{r parameters} # generator matrix Q Q = matrix(c(-0.5, 0.5, 1.0, -1.0), nrow = 2, byrow = TRUE) # state-dependent (normal) distribution parameters mu = c(5, 20) sigma = c(2, 5) ``` We simulate the CTMC by drawing exponentially distributed dwell times, then switching state. Within each sojourn we generate observation times as a Poisson process with rate $\lambda = 1$ — this represents an irregular but otherwise uninformative observation mechanism that is **not part of the model**. ```{r data} set.seed(123) k = 200 # number of state switches to simulate s = rep(NA, k) # latent state sequence trans_times = rep(NA, k) # cumulative times at which state switches occur # initialise: draw first state and its dwell time s[1] = sample(1:2, 1) dwell = rexp(1, -Q[s[1], s[1]]) # dwell time ~ Exp(-q_ii) trans_times[1] = dwell # Poisson arrivals during first sojourn [0, dwell], uniform within new_obs = sort(runif(rpois(1, dwell), 0, dwell)) obs_times = new_obs x = rnorm(length(new_obs), mu[s[1]], sigma[s[1]]) for(t in 2:k){ # for 2 states, the only possible next state is 3 - current s[t] = 3 - s[t-1] # dwell time in new state, and absolute transition time dwell = rexp(1, -Q[s[t], s[t]]) trans_times[t] = trans_times[t-1] + dwell # Poisson arrivals during sojourn [t_start, t_end], uniform within t_start = trans_times[t-1] t_end = trans_times[t] new_obs = sort(runif(rpois(1, dwell), t_start, t_end)) obs_times = c(obs_times, new_obs) x = c(x, rnorm(length(new_obs), mu[s[t]], sigma[s[t]])) } ``` The plot below shows the first 50 observations. The coloured bar at the bottom indicates the true (latent) state during each sojourn, illustrating the irregular observation times within each stay. ```{r vis_ctHMM} color = LaMaColors(2) n = length(obs_times) plot(obs_times[1:50], x[1:50], pch = 16, bty = "n", xlab = "observation times", ylab = "x", ylim = c(-5, 25)) segments(x0 = c(0, trans_times[1:48]), x1 = trans_times[1:49], y0 = rep(-5, 49), y1 = rep(-5, 49), col = color[s[1:49]], lwd = 4) legend("topright", lwd = 2, col = color, legend = c("state 1", "state 2"), box.lwd = 0) ``` ### Writing the negative log-likelihood function The likelihood function follows the same pattern as a regular inhomogeneous HMM. The only addition is the `generator()` and `tpm_ct()` calls: `generator()` maps the unconstrained `log_qs` vector (one entry per off-diagonal element of $Q$, on the log scale to ensure positivity) to a valid generator matrix, and `tpm_ct()` computes $\exp(Q \, \Delta t)$ for every observed time difference at once. ```{r nll} nll = function(par) { getAll(par, dat) mu = exp(log_mu); REPORT(mu) sigma = exp(log_sigma); REPORT(sigma) Q = generator(log_qs); REPORT(Q) # maps unconstrained params -> valid Q Pi = stationary_ct(Q) # stationary distribution of the CTMC Qube = tpm_ct(Q, timediff) # exp(Q * dt) for each time difference allprobs = matrix(1, length(x), N) ind = which(!is.na(x)) for(j in 1:N) allprobs[ind, j] = dnorm(x[ind], mu[j], sigma[j]) -forward(Pi, Qube, allprobs) } ``` ### Fitting the model ```{r model, warning = FALSE} N = 2 timediff = diff(obs_times) # time differences between consecutive observations par = list( log_mu = log(c(5, 15)), # initial state-dependent means (log scale) log_sigma = log(c(3, 5)), # initial state-dependent sds (log scale) log_qs = log(c(1, 0.5)) # initial off-diagonal generator entries (log scale) ) dat = list( x = x, timediff = timediff, N = N ) obj = MakeADFun(nll, par, silent = TRUE) system.time( opt <- nlminb(obj$par, obj$fn, obj$gr) ) mod = report(obj) ``` ### Results ```{r results} mod$mu # true: 5, 20 mod$sigma # true: 2, 5 round(mod$Q, 3) # true: Q as defined above ``` The estimated generator can be interpreted directly: the mean dwell time in each state is $1/(-\hat{q}_{ii})$. ```{r dwell_times} round(1 / diag(-mod$Q), 2) # estimated mean dwell times; true: 2, 1 ``` The transition probability curves $\gamma_{ij}(\Delta t)$ show how quickly the process mixes as the lag grows. At $\Delta t = 0$ the matrix is the identity; as $\Delta t \to \infty$ both off-diagonal entries converge to the stationary probability of the destination state (dashed lines), reflecting full mixing. ```{r tpm_curves} dt_seq = seq(0, 10, length.out = 300) Qube_seq = tpm_ct(mod$Q, dt_seq) Pi_ct = stationary_ct(mod$Q) plot(dt_seq, Qube_seq[1, 2, ], type = "l", lwd = 2, col = color[1], bty = "n", xlab = expression(Delta*t), ylab = "transition probability", ylim = c(0, 1)) lines(dt_seq, Qube_seq[2, 1, ], lwd = 2, col = color[2]) abline(h = Pi_ct[2], lty = 2, col = color[1]) # long-run limit of gamma_12 abline(h = Pi_ct[1], lty = 2, col = color[2]) # long-run limit of gamma_21 legend("right", lwd = 2, col = color, bty = "n", legend = c(expression(gamma[12](Delta*t)), expression(gamma[21](Delta*t)))) ``` We can also decode the most probable hidden state sequence using `viterbi()`. Because `forward()` automatically stores `delta`, `Qube`, and `allprobs` in the model report, we can pass the report object directly. ```{r state_decoding} states = viterbi(mod = mod) plot(obs_times[1:50], x[1:50], pch = 16, bty = "n", col = color[states[1:50]], xlab = "observation times", ylab = "x", ylim = c(-5, 25)) legend("topright", pch = 16, col = color, legend = paste("state", 1:N), box.lwd = 0) ``` ## Example 2: three states ### Simulating data ```{r parameters2} # generator matrix Q Q = matrix(c(-0.5, 0.2, 0.3, 1.0, -2.0, 1.0, 0.4, 0.6, -1.0), nrow = 3, byrow = TRUE) # state-dependent (normal) distribution parameters mu = c(5, 15, 30) sigma = c(2, 3, 5) ``` The three-state simulation differs from Example 1 in one key place: when the chain leaves a state, there are now multiple possible destination states, so we have to draw *which* state to go to. The probabilities are given by $\omega_{ij} = q_{ij}/(-q_{ii})$, i.e. the off-diagonal entries of the current row of $Q$ normalised by the total leaving rate. ```{r data2} set.seed(123) k = 200 s = rep(NA, k) trans_times = rep(NA, k) # initialise s[1] = sample(1:3, 1) dwell = rexp(1, -Q[s[1], s[1]]) trans_times[1] = dwell new_obs = sort(runif(rpois(1, dwell), 0, dwell)) obs_times = new_obs x = rnorm(length(new_obs), mu[s[1]], sigma[s[1]]) for(t in 2:k){ # conditional transition probabilities: omega_ij = q_ij / (-q_ii) omega = Q[s[t-1], -s[t-1]] / -Q[s[t-1], s[t-1]] s[t] = sample((1:3)[-s[t-1]], 1, prob = omega) dwell = rexp(1, -Q[s[t], s[t]]) trans_times[t] = trans_times[t-1] + dwell t_start = trans_times[t-1] t_end = trans_times[t] new_obs = sort(runif(rpois(1, dwell), t_start, t_end)) obs_times = c(obs_times, new_obs) x = c(x, rnorm(length(new_obs), mu[s[t]], sigma[s[t]])) } ``` ### Fitting the model The same `nll` function works for any number of states — only `par`, `dat` and `N` need to change. Note that a 3-state model has $N(N-1) = 6$ off-diagonal entries in $Q$. ```{r model2, warning = FALSE} N = 3 timediff = diff(obs_times) par = list( log_mu = log(c(5, 10, 25)), # initial state-dependent means log_sigma = log(c(2, 2, 6)), # initial state-dependent sds log_qs = rep(0, N*(N-1)) # initial off-diagonal generator entries (exp(0) = 1) ) dat = list( x = x, timediff = timediff, N = N ) obj2 = MakeADFun(nll, par, silent = TRUE) system.time( opt2 <- nlminb(obj2$par, obj2$fn, obj2$gr) ) mod2 = report(obj2) ``` ### Results ```{r results2} mod2$mu # true: 5, 15, 30 mod2$sigma # true: 2, 3, 5 round(mod2$Q, 3) # true: Q as defined above ``` ```{r dwell_times2} round(1 / diag(-mod2$Q), 2) # estimated mean dwell times; true: 2, 0.5, 1 ``` ```{r state_decoding2} color3 = LaMaColors(3) states2 = viterbi(mod = mod2) plot(obs_times[1:50], x[1:50], pch = 16, bty = "n", col = color3[states2[1:50]], xlab = "observation times", ylab = "x", ylim = c(-5, 40)) legend("topright", pch = 16, col = color3, legend = paste("state", 1:N), box.lwd = 0) ``` > Continue reading with [**Markov-modulated Poisson processes**](https://janolefi.github.io/LaMa/articles/MMMPPs.html). ## References