---
title: "6 Continuous-time HMMs"
author: "Jan-Ole Fischer"
output: rmarkdown::html_vignette
# output: pdf_document
vignette: >
  %\VignetteIndexEntry{Continuous-time_HMMs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: refs.bib
link-citations: yes
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  # fig.path = "img/",
  fig.align = "center",
  fig.dim = c(8, 6),
  out.width = "85%"
)

options(rmarkdown.html_vignette.check_title = FALSE)
```

> Before diving into this vignette, we recommend reading the vignettes [**Introduction to LaMa**](https://janolefi.github.io/LaMa/articles/Intro_to_LaMa.html) and [**Automatic differentiation via RTMB**](https://janolefi.github.io/LaMa/articles/LaMa_and_RTMB.html).

```{r setup}
library(LaMa)
```

In standard HMMs, observations are assumed to arrive at **regular, equidistant time points**, so that the transition probability matrix $\Gamma$ can be given a consistent interpretation with respect to a fixed time unit (e.g. one hour or one day). In many real-world applications this assumption breaks down: animal telemetry fixes arrive at varying intervals, patient measurements are taken at unequal times, or data simply have gaps. Using a regular HMM in such cases leads to a misspecified transition structure, because the same $\Gamma$ is applied regardless of whether the time gap is short or long.

The natural remedy is a **continuous-time HMM**, which replaces the discrete-time Markov chain with a **continuous-time Markov chain (CTMC)**. The observation model and likelihood structure remain the same; only the state process changes. One important prerequisite is the **snapshot property**: the observed value at time $t$ may only depend on the state *at that instant*, not on the history of the process or on how much time has elapsed since the previous observation. This is a reasonable assumption for many biological and physical processes. For more details see @glennie2023hidden.

## The generator matrix

A CTMC is fully characterised by its **(infinitesimal) generator matrix**
$$
Q = \begin{pmatrix}
  q_{11} & q_{12} & \cdots & q_{1N} \\
  q_{21} & q_{22} & \cdots & q_{2N} \\
  \vdots & \vdots & \ddots & \vdots \\
  q_{N1} & q_{N2} & \cdots & q_{NN} \\
\end{pmatrix},
$$
where $q_{ij} \geq 0$ for $i \neq j$ and the diagonal satisfies $q_{ii} = -\sum_{j \neq i} q_{ij}$, so that every row sums to zero.

The off-diagonal entry $q_{ij}$ is the **instantaneous rate** at which the process transitions from state $i$ to state $j$. The diagonal entry $-q_{ii} = \sum_{j \neq i} q_{ij}$ is the total leaving rate from state $i$, and the **dwell time** (time spent in state $i$ before leaving) is exponentially distributed with mean $1/(-q_{ii})$. Conditional on leaving state $i$, the probability of moving to state $j \neq i$ is
$$
\omega_{ij} = \frac{q_{ij}}{-q_{ii}}.
$$
These two properties — the dwell time distribution and the conditional next-state probabilities — together fully describe the dynamics of the chain, and they are also how we simulate it in the examples below.

## From the generator to transition probabilities

Given $Q$ and a time interval of length $\Delta t$, the transition probability matrix is obtained via the **matrix exponential**:
$$
\Gamma(\Delta t) = \exp(Q \, \Delta t).
$$
This follows from the Kolmogorov forward equations; see @dobrow2016introduction for details. The key practical point is that $\Gamma(\Delta t)$ automatically adapts to the length of the interval: short gaps produce a matrix close to the identity (little chance of switching), and long gaps produce a matrix close to the stationary distribution (the process has had time to mix). `LaMa` computes this efficiently for an entire vector of time differences using `tpm_ct()`.

## Likelihood

The likelihood of a continuous-time HMM for observations $x_{t_1}, \dots, x_{t_T}$ at irregular times $t_1 < \dots < t_T$ has **exactly the same structure** as the regular HMM likelihood:
$$
L(\theta) = \delta^{(1)} \,\Gamma(\Delta t_1)\, P(x_{t_2})\, \Gamma(\Delta t_2)\, P(x_{t_3}) \cdots \Gamma(\Delta t_{T-1})\, P(x_{t_T})\, \mathbf{1}^\top,
$$
where $\Delta t_k = t_{k+1} - t_k$ and each $\Gamma(\Delta t_k) = \exp(Q \,\Delta t_k)$ is specific to its interval. The initial distribution $\delta^{(1)}$ is typically set to the stationary distribution of the CTMC, available via `stationary_ct()`. Everything else — the allprobs matrix and the forward algorithm via `forward()` — is identical to the regular HMM case.


## Example 1: two states

### Simulating data

We start by setting parameters. The generator below specifies that from state 1 the leaving rate is 0.5 (mean dwell time $1/0.5 = 2$ time units), and from state 2 the leaving rate is 1 (mean dwell time $1/1 = 1$ time unit), so state 1 is the more persistent state.

```{r parameters}
# generator matrix Q
Q = matrix(c(-0.5, 0.5,
              1.0, -1.0), nrow = 2, byrow = TRUE)

# state-dependent (normal) distribution parameters
mu = c(5, 20)
sigma = c(2, 5)
```

We simulate the CTMC by drawing exponentially distributed dwell times, then switching state. Within each sojourn we generate observation times as a Poisson process with rate $\lambda = 1$ — this represents an irregular but otherwise uninformative observation mechanism that is **not part of the model**.

```{r data}
set.seed(123)

k = 200              # number of state switches to simulate
s = rep(NA, k)       # latent state sequence
trans_times = rep(NA, k) # cumulative times at which state switches occur

# initialise: draw first state and its dwell time
s[1] = sample(1:2, 1)
dwell = rexp(1, -Q[s[1], s[1]])  # dwell time ~ Exp(-q_ii)
trans_times[1] = dwell

# Poisson arrivals during first sojourn [0, dwell], uniform within
new_obs = sort(runif(rpois(1, dwell), 0, dwell))
obs_times = new_obs
x = rnorm(length(new_obs), mu[s[1]], sigma[s[1]])

for(t in 2:k){
  # for 2 states, the only possible next state is 3 - current
  s[t] = 3 - s[t-1]

  # dwell time in new state, and absolute transition time
  dwell = rexp(1, -Q[s[t], s[t]])
  trans_times[t] = trans_times[t-1] + dwell

  # Poisson arrivals during sojourn [t_start, t_end], uniform within
  t_start = trans_times[t-1]
  t_end   = trans_times[t]
  new_obs = sort(runif(rpois(1, dwell), t_start, t_end))

  obs_times = c(obs_times, new_obs)
  x = c(x, rnorm(length(new_obs), mu[s[t]], sigma[s[t]]))
}
```

The plot below shows the first 50 observations. The coloured bar at the bottom indicates the true (latent) state during each sojourn, illustrating the irregular observation times within each stay.

```{r vis_ctHMM}
color = LaMaColors(2)

n = length(obs_times)
plot(obs_times[1:50], x[1:50], pch = 16, bty = "n",
     xlab = "observation times", ylab = "x", ylim = c(-5, 25))
segments(x0 = c(0, trans_times[1:48]), x1 = trans_times[1:49],
         y0 = rep(-5, 49), y1 = rep(-5, 49),
         col = color[s[1:49]], lwd = 4)
legend("topright", lwd = 2, col = color,
       legend = c("state 1", "state 2"), box.lwd = 0)
```

### Writing the negative log-likelihood function

The likelihood function follows the same pattern as a regular inhomogeneous HMM. The only addition is the `generator()` and `tpm_ct()` calls: `generator()` maps the unconstrained `log_qs` vector (one entry per off-diagonal element of $Q$, on the log scale to ensure positivity) to a valid generator matrix, and `tpm_ct()` computes $\exp(Q \, \Delta t)$ for every observed time difference at once.

```{r nll}
nll = function(par) {
  getAll(par, dat)
  mu = exp(log_mu); REPORT(mu)
  sigma = exp(log_sigma); REPORT(sigma)
  Q = generator(log_qs); REPORT(Q)  # maps unconstrained params -> valid Q
  Pi = stationary_ct(Q)             # stationary distribution of the CTMC
  Qube = tpm_ct(Q, timediff)     # exp(Q * dt) for each time difference
  allprobs = matrix(1, length(x), N)
  ind = which(!is.na(x))
  for(j in 1:N) allprobs[ind, j] = dnorm(x[ind], mu[j], sigma[j])
  -forward(Pi, Qube, allprobs)
}
```

### Fitting the model

```{r model, warning = FALSE}
N = 2
timediff = diff(obs_times) # time differences between consecutive observations

par = list(
  log_mu = log(c(5, 15)),    # initial state-dependent means (log scale)
  log_sigma = log(c(3, 5)),  # initial state-dependent sds (log scale)
  log_qs = log(c(1, 0.5))    # initial off-diagonal generator entries (log scale)
)

dat = list(
  x = x,
  timediff = timediff,
  N = N
)

obj = MakeADFun(nll, par, silent = TRUE)
system.time(
  opt <- nlminb(obj$par, obj$fn, obj$gr)
)
mod = report(obj)
```

### Results

```{r results}
mod$mu     # true: 5, 20
mod$sigma  # true: 2, 5
round(mod$Q, 3) # true: Q as defined above
```

The estimated generator can be interpreted directly: the mean dwell time in each state is $1/(-\hat{q}_{ii})$.

```{r dwell_times}
round(1 / diag(-mod$Q), 2) # estimated mean dwell times; true: 2, 1
```

The transition probability curves $\gamma_{ij}(\Delta t)$ show how quickly the process mixes as the lag grows. At $\Delta t = 0$ the matrix is the identity; as $\Delta t \to \infty$ both off-diagonal entries converge to the stationary probability of the destination state (dashed lines), reflecting full mixing.

```{r tpm_curves}
dt_seq = seq(0, 10, length.out = 300)
Qube_seq = tpm_ct(mod$Q, dt_seq)
Pi_ct = stationary_ct(mod$Q)

plot(dt_seq, Qube_seq[1, 2, ], type = "l", lwd = 2, col = color[1], bty = "n",
     xlab = expression(Delta*t), ylab = "transition probability", ylim = c(0, 1))
lines(dt_seq, Qube_seq[2, 1, ], lwd = 2, col = color[2])
abline(h = Pi_ct[2], lty = 2, col = color[1])  # long-run limit of gamma_12
abline(h = Pi_ct[1], lty = 2, col = color[2])  # long-run limit of gamma_21
legend("right", lwd = 2, col = color, bty = "n",
       legend = c(expression(gamma[12](Delta*t)), expression(gamma[21](Delta*t))))
```

We can also decode the most probable hidden state sequence using `viterbi()`. Because `forward()` automatically stores `delta`, `Qube`, and `allprobs` in the model report, we can pass the report object directly.

```{r state_decoding}
states = viterbi(mod = mod)

plot(obs_times[1:50], x[1:50], pch = 16, bty = "n",
     col = color[states[1:50]],
     xlab = "observation times", ylab = "x", ylim = c(-5, 25))
legend("topright", pch = 16, col = color,
       legend = paste("state", 1:N), box.lwd = 0)
```


## Example 2: three states

### Simulating data

```{r parameters2}
# generator matrix Q
Q = matrix(c(-0.5, 0.2, 0.3,
              1.0, -2.0, 1.0,
              0.4,  0.6, -1.0), nrow = 3, byrow = TRUE)

# state-dependent (normal) distribution parameters
mu = c(5, 15, 30)
sigma = c(2, 3, 5)
```

The three-state simulation differs from Example 1 in one key place: when the chain leaves a state, there are now multiple possible destination states, so we have to draw *which* state to go to. The probabilities are given by $\omega_{ij} = q_{ij}/(-q_{ii})$, i.e. the off-diagonal entries of the current row of $Q$ normalised by the total leaving rate.

```{r data2}
set.seed(123)

k = 200
s = rep(NA, k)
trans_times = rep(NA, k)

# initialise
s[1] = sample(1:3, 1)
dwell = rexp(1, -Q[s[1], s[1]])
trans_times[1] = dwell

new_obs = sort(runif(rpois(1, dwell), 0, dwell))
obs_times = new_obs
x = rnorm(length(new_obs), mu[s[1]], sigma[s[1]])

for(t in 2:k){
  # conditional transition probabilities: omega_ij = q_ij / (-q_ii)
  omega = Q[s[t-1], -s[t-1]] / -Q[s[t-1], s[t-1]]
  s[t] = sample((1:3)[-s[t-1]], 1, prob = omega)

  dwell = rexp(1, -Q[s[t], s[t]])
  trans_times[t] = trans_times[t-1] + dwell

  t_start = trans_times[t-1]
  t_end   = trans_times[t]
  new_obs = sort(runif(rpois(1, dwell), t_start, t_end))

  obs_times = c(obs_times, new_obs)
  x = c(x, rnorm(length(new_obs), mu[s[t]], sigma[s[t]]))
}
```

### Fitting the model

The same `nll` function works for any number of states — only `par`, `dat` and `N` need to change. Note that a 3-state model has $N(N-1) = 6$ off-diagonal entries in $Q$.

```{r model2, warning = FALSE}
N = 3
timediff = diff(obs_times)

par = list(
  log_mu = log(c(5, 10, 25)),   # initial state-dependent means
  log_sigma = log(c(2, 2, 6)),  # initial state-dependent sds
  log_qs = rep(0, N*(N-1))      # initial off-diagonal generator entries (exp(0) = 1)
)

dat = list(
  x = x,
  timediff = timediff,
  N = N
)

obj2 = MakeADFun(nll, par, silent = TRUE)
system.time(
  opt2 <- nlminb(obj2$par, obj2$fn, obj2$gr)
)
mod2 = report(obj2)
```

### Results

```{r results2}
mod2$mu     # true: 5, 15, 30
mod2$sigma  # true: 2, 3, 5
round(mod2$Q, 3) # true: Q as defined above
```

```{r dwell_times2}
round(1 / diag(-mod2$Q), 2) # estimated mean dwell times; true: 2, 0.5, 1
```

```{r state_decoding2}
color3 = LaMaColors(3)
states2 = viterbi(mod = mod2)

plot(obs_times[1:50], x[1:50], pch = 16, bty = "n",
     col = color3[states2[1:50]],
     xlab = "observation times", ylab = "x", ylim = c(-5, 40))
legend("topright", pch = 16, col = color3,
       legend = paste("state", 1:N), box.lwd = 0)
```

> Continue reading with [**Markov-modulated Poisson processes**](https://janolefi.github.io/LaMa/articles/MMMPPs.html).

## References