MCP HubMCP Hub
Вернуться к навыкам

fit-hidden-markov-model

pjt222
Обновлено 2 days ago
7 просмотров
17
2
17
Посмотреть на GitHub
Метаgeneral

О программе

Этот навык обучает скрытые марковские модели (HMM) на последовательных данных, используя алгоритм Баума-Уэлча для обучения и декодирование Витерби для вывода состояний. Он предназначен для сегментации временных рядов на скрытые режимы, такие как рыночные состояния или фонемы, и для сравнения моделей с разным количеством скрытых состояний. Разработчики могут использовать его для вычисления вероятностей последовательностей и декодирования наиболее вероятного пути скрытых состояний по наблюдениям.

Быстрая установка

Claude Code

Рекомендуется
Основной
npx skills add pjt222/agent-almanac -a claude-code
Команда плагинаАльтернативный
/plugin add https://github.com/pjt222/agent-almanac
Git клонированиеАльтернативный
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/fit-hidden-markov-model

Скопируйте и вставьте эту команду в Claude Code для установки этого навыка

Документация

Fit Hidden Markov Model

Fit a hidden Markov model (HMM) to sequential observation data using the Baum-Welch expectation-maximization algorithm, decode the most likely hidden state sequence via Viterbi, and select the optimal number of hidden states through information criteria.

When to Use

  • You observe a sequence of emissions but the underlying generative states are not directly observable
  • You suspect your data is generated by a system that switches between a finite number of regimes
  • You need to segment a time series into latent phases (e.g., market regimes, speech phonemes, biological sequence annotation)
  • You want to compute the probability of an observed sequence under a generative model
  • You need the most likely sequence of hidden states given observations (decoding)
  • You are comparing models with different numbers of hidden states for the best complexity-fit trade-off

Inputs

Required

InputTypeDescription
observationssequence/matrixObserved data sequence (univariate or multivariate)
n_hidden_statesintegerNumber of hidden states to fit (or a range for model selection)
emission_typestringDistribution family for emissions: "gaussian", "discrete", "poisson", "multinomial"

Optional

InputTypeDefaultDescription
initial_paramsdictrandom/heuristicInitial transition matrix, emission parameters, and start probabilities
n_restartsinteger10Number of random restarts to mitigate local optima
max_iterationsinteger500Maximum EM iterations per restart
convergence_tolfloat1e-6Log-likelihood convergence threshold for EM
state_rangelist of ints[n_hidden_states]Range of state counts for model selection
covariance_typestring"full"For Gaussian emissions: "full", "diagonal", "spherical"
regularizationfloat1e-6Small constant added to diagonal of covariance matrices to prevent singularity

Procedure

Step 1: Define Hidden States and Observation Model

1.1. Specify the number of hidden states K (or a candidate range for model selection in Step 5).

1.2. Choose the emission distribution family based on the data type:

  • Continuous data: Gaussian (univariate or multivariate)
  • Count data: Poisson or negative binomial
  • Categorical data: discrete/multinomial

1.3. Define the model components:

  • Transition matrix A of size K x K: A[i,j] = P(z_t = j | z_{t-1} = i)
  • Emission parameters theta_k for each state k: distribution-specific (e.g., mean and covariance for Gaussian)
  • Initial state distribution pi: pi[k] = P(z_1 = k)

1.4. Verify observation data is properly formatted: no missing values in the sequence, consistent dimensionality, and sufficient length relative to the number of parameters.

Got: A clearly specified HMM architecture with K states, a chosen emission family, and clean observation data of length T >> K^2.

If fail: If data contains missing values, impute or remove affected segments. If T is too small relative to K, reduce K or acquire more data.

Step 2: Initialize Parameters

2.1. Generate initial parameters for each of the n_restarts restarts:

  • Transition matrix: Random stochastic matrix (each row drawn from a Dirichlet distribution) or a slightly perturbed uniform matrix.
  • Emission parameters: Use K-means clustering on the observations to initialize means; compute cluster variances for Gaussian emissions.
  • Initial distribution: Uniform or proportional to cluster sizes from K-means.

2.2. For the first restart, use the K-means-informed initialization (generally the strongest start). For subsequent restarts, use random perturbations.

2.3. Verify all initial parameters are valid:

  • Transition matrix rows sum to 1 with all entries positive.
  • Emission parameters are in the valid domain (e.g., covariance matrices are positive definite).
  • Initial distribution sums to 1.

Got: n_restarts sets of valid initial parameters, with at least one data-driven initialization.

If fail: If K-means fails to converge, use purely random initialization with more restarts. If covariance matrices are singular, add the regularization constant to the diagonal.

Step 3: Run Baum-Welch EM for Parameter Estimation

3.1. E-step (Forward-Backward algorithm):

  • Compute forward probabilities alpha[t,k] = P(o_1,...,o_t, z_t=k | model) using the recursion:
    • alpha[1,k] = pi[k] * b_k(o_1)
    • alpha[t,k] = sum_j(alpha[t-1,j] * A[j,k]) * b_k(o_t)
  • Compute backward probabilities beta[t,k] = P(o_{t+1},...,o_T | z_t=k, model):
    • beta[T,k] = 1
    • beta[t,k] = sum_j(A[k,j] * b_j(o_{t+1}) * beta[t+1,j])
  • Compute state posterior gamma[t,k] = P(z_t=k | O, model):
    • gamma[t,k] = alpha[t,k] * beta[t,k] / P(O | model)
  • Compute transition posterior xi[t,i,j] = P(z_t=i, z_{t+1}=j | O, model).

3.2. M-step (Parameter re-estimation):

  • Update transition matrix: A[i,j] = sum_t(xi[t,i,j]) / sum_t(gamma[t,i])
  • Update emission parameters using weighted sufficient statistics:
    • Gaussian mean: mu_k = sum_t(gamma[t,k] * o_t) / sum_t(gamma[t,k])
    • Gaussian covariance: weighted scatter matrix plus regularization
    • Discrete: b_k(v) = sum_t(gamma[t,k] * I(o_t=v)) / sum_t(gamma[t,k])
  • Update initial distribution: pi[k] = gamma[1,k]

3.3. Compute log-likelihood: log P(O | model) = log sum_k(alpha[T,k]). Use the log-sum-exp trick to prevent underflow.

3.4. Scaling: Use scaled forward-backward variables to prevent numerical underflow for long sequences. Normalize alpha at each time step and accumulate log scaling factors.

3.5. Repeat E-step and M-step until log-likelihood change is below convergence_tol or max_iterations is reached.

3.6. Across all restarts, keep the parameter set with the highest final log-likelihood.

Got: Monotonically non-decreasing log-likelihood across iterations, converging within max_iterations. Final parameters are valid (stochastic matrices, positive-definite covariances).

If fail: If log-likelihood decreases, there is a bug in the E-step or M-step -- verify formulas. If convergence is very slow, try better initialization or increase max_iterations. If covariance becomes singular, increase regularization.

Step 4: Apply Viterbi Decoding for Most Likely State Sequence

4.1. Initialize Viterbi variables:

  • delta[1,k] = log(pi[k]) + log(b_k(o_1))
  • psi[1,k] = 0 (no predecessor)

4.2. Recurse forward for t = 2,...,T:

  • delta[t,k] = max_j(delta[t-1,j] + log(A[j,k])) + log(b_k(o_t))
  • psi[t,k] = argmax_j(delta[t-1,j] + log(A[j,k]))

4.3. Terminate:

  • z*_T = argmax_k(delta[T,k])
  • Best path log-probability: max_k(delta[T,k])

4.4. Backtrace for t = T-1,...,1:

  • z*_t = psi[t+1, z*_{t+1}]

4.5. Output the decoded state sequence z* = (z*_1, ..., z*_T) and its log-probability.

4.6. Compare the Viterbi path probability to the total sequence probability from the forward algorithm to assess how dominant the best path is.

Got: A single most-likely state sequence of length T with each entry in {1,...,K}. The Viterbi log-probability should be less than or equal to the total log-likelihood.

If fail: If the Viterbi path has log-probability of negative infinity, some transition or emission probability is zero where it should not be. Add floor values to prevent log(0).

Step 5: Perform Model Selection (BIC/AIC Across Model Orders)

5.1. For each candidate number of hidden states K in state_range, fit the full HMM (Steps 2-4).

5.2. Compute the number of free parameters p:

  • Transition matrix: K * (K - 1) (each row is a simplex)
  • Emission parameters: depends on family (e.g., Gaussian with full covariance in d dimensions: K * (d + d*(d+1)/2))
  • Initial distribution: K - 1

5.3. Compute information criteria:

  • BIC = -2 * log_likelihood + p * log(T)
  • AIC = -2 * log_likelihood + 2 * p
  • AICc = AIC + 2*p*(p+1) / (T - p - 1) (small-sample correction)

5.4. Select the model with the lowest BIC (preferred for consistency) or AIC (preferred for prediction). Report both.

5.5. Tabulate results: for each K, show log-likelihood, number of parameters, BIC, AIC, and convergence status.

5.6. If the optimal K is at the boundary of state_range, extend the range and re-fit.

Got: A clear minimum in BIC/AIC identifying the optimal number of hidden states. The selected model should have converged and have interpretable state meanings.

If fail: If no clear minimum exists (monotonically decreasing BIC), the model may be misspecified -- consider a different emission family. If all models have poor log-likelihood, the data may not follow an HMM structure.

Step 6: Validate with Held-Out Data and Posterior Decoding

6.1. Split data into training and validation sets (e.g., 80/20 or use multiple sequences if available).

6.2. Fit the model on training data. Compute log-likelihood on held-out data using the forward algorithm (do not re-fit parameters).

6.3. Posterior decoding (alternative to Viterbi):

  • For each time step, assign the state with highest posterior probability: z^_t = argmax_k(gamma[t,k])
  • This maximizes the expected number of correctly decoded states (vs. Viterbi which maximizes the joint path probability).

6.4. Compare Viterbi and posterior decoding:

  • Compute agreement rate between the two decoded sequences.
  • Regions of disagreement indicate ambiguous state assignments.

6.5. Assess state interpretability:

  • Examine emission parameters for each state (means, variances, discrete distributions).
  • Verify states correspond to meaningful regimes in the domain context.
  • Check that state dwell times (implied by diagonal of A) are reasonable.

6.6. Compute held-out log-likelihood per observation and compare across model orders to confirm the training-set model selection.

Got: Held-out log-likelihood is reasonably close to training log-likelihood (no severe overfitting). Viterbi and posterior decoding agree on 90%+ of time steps. States have distinct, interpretable emission distributions.

If fail: If held-out likelihood is much worse than training, the model is overfitting -- reduce K or increase regularization. If states are not interpretable, try different initializations or a different emission family.

Validation

  • Log-likelihood is monotonically non-decreasing across Baum-Welch iterations for each restart
  • The transition matrix is row-stochastic (rows sum to 1, all entries non-negative)
  • Emission parameters are in the valid domain (positive-definite covariances, valid probability distributions)
  • The Viterbi path log-probability does not exceed the total sequence log-probability
  • BIC/AIC curves show a clear minimum at the selected model order
  • Held-out log-likelihood confirms the model generalizes beyond the training set
  • Forward and backward probability computations agree: P(O) = sum_k(alpha[T,k]) = sum_k(pi[k] * b_k(o_1) * beta[1,k])

Pitfalls

  • Local optima in EM: The Baum-Welch algorithm converges to a local maximum, not necessarily the global one. Always use multiple random restarts and pick the best.
  • Numerical underflow: Forward-backward probabilities shrink exponentially with sequence length. Use log-space computation or scaled variables to prevent underflow to zero.
  • Overfitting with too many states: Each additional hidden state adds O(K + d^2) parameters. Use BIC (not just likelihood) for model selection and validate on held-out data.
  • Label switching: Hidden states are identifiable only up to permutation. When comparing models across restarts, match states by emission parameters, not by index.
  • Degenerate states: A state may collapse to explain a single observation (Gaussian with near-zero variance). Regularization on covariance matrices prevents this.
  • Confusing Viterbi and posterior decoding: Viterbi gives the single best joint path; posterior decoding gives the best marginal state at each time step. They answer different questions and can disagree significantly.
  • Ignoring state dwell times: The geometric dwell-time distribution implicit in standard HMMs may be a poor fit for data with long regime durations. Consider hidden semi-Markov models if dwell times are non-geometric.

Related Skills

  • Model Markov Chain -- prerequisite for understanding the transition structure that underlies the hidden layer
  • Simulate Stochastic Process -- can be used to generate synthetic HMM data for testing and to simulate from a fitted model for posterior predictive checks

GitHub репозиторий

pjt222/agent-almanac
Путь: i18n/caveman-lite/skills/fit-hidden-markov-model
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

Похожие навыки

content-collections

Мета

Этот навык предоставляет проверенную в продакшене настройку для Content Collections — TypeScript-ориентированного инструмента, который преобразует файлы Markdown/MDX в типобезопасные коллекции данных с валидацией Zod. Используйте его при создании блогов, сайтов документации или контентных приложений на Vite + React для обеспечения типобезопасности и автоматической проверки содержимого. Он охватывает всё: от настройки плагина Vite и компиляции MDX до оптимизации развертывания и валидации схем.

Просмотреть навык

polymarket

Мета

Этот навык позволяет разработчикам создавать приложения на платформе прогнозных рынков Polymarket, включая интеграцию с API для торговли и получения рыночных данных. Он также обеспечивает потоковую передачу данных в реальном времени через WebSocket для отслеживания текущих сделок и рыночной активности. Используйте его для реализации торговых стратегий или создания инструментов, обрабатывающих обновления рынка в реальном времени.

Просмотреть навык

creating-opencode-plugins

Мета

Этот навык помогает разработчикам создавать плагины OpenCode, которые подключаются к более чем 25 типам событий, таким как команды, файлы и операции LSP. Он предоставляет структуру плагина, спецификации API событий и шаблоны реализации для модулей на JavaScript/TypeScript. Используйте его, когда вам нужно перехватывать, отслеживать или расширять жизненный цикл ассистента OpenCode AI с помощью пользовательской событийно-ориентированной логики.

Просмотреть навык

sglang

Мета

SGLang — это высокопроизводительный фреймворк для обслуживания больших языковых моделей (LLM), специализирующийся на быстрой структурированной генерации JSON, regex и рабочих процессов агентов с использованием кэширования префиксов RadixAttention. Он обеспечивает значительно более высокую скорость вывода, особенно для задач с повторяющимися префиксами, что делает его идеальным для сложных структурированных результатов и многократных диалогов. Выбирайте SGLang вместо альтернатив, таких как vLLM, когда вам требуется ограниченное декодирование или вы создаете приложения с интенсивным совместным использованием префиксов.

Просмотреть навык