fit-hidden-markov-model
について
このスキルは、市場レジームのセグメンテーションや生物学的配列解析など、観測不可能な潜在状態が存在するシナリオにおいて、バウム・ウェルチアルゴリズムを用いて時系列データに隠れマルコフモデル(HMM)を適合させます。主要な機能として、最尤状態経路を推定するビタビデコーディング、前向き・後ろ向き確率の計算、異なる潜在状態数のモデルを比較するモデル選択を提供します。観測データから潜在構造を推論する必要がある場合、系列確率を計算したい場合、あるいは隠れ状態系列をデコードしたい場合にご利用ください。
クイックインストール
Claude Code
推奨npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/fit-hidden-markov-modelこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Fit Hidden Markov Model
Fit hidden Markov model (HMM) to sequential watch data using Baum-Welch expectation-maximization algorithm, decode most likely hidden state sequence via Viterbi, and pick right number of hidden states through info criteria.
When Use
- You see sequence of emissions but the under generative states not directly visible
- You suspect your data is made by system that switches between finite number of regimes
- You need to slice time series into latent phases (e.g., market regimes, speech phonemes, biological sequence annotation)
- You want to compute probability of observed sequence under generative model
- You need most likely sequence of hidden states given observations (decoding)
- You compare models with different counts of hidden states for best complex-fit trade-off
Inputs
Required
| Input | Type | Description |
|---|---|---|
observations | sequence/matrix | Observed data sequence (univariate or multivariate) |
n_hidden_states | integer | Number of hidden states to fit (or a range for model selection) |
emission_type | string | Distribution family for emissions: "gaussian", "discrete", "poisson", "multinomial" |
Optional
| Input | Type | Default | Description |
|---|---|---|---|
initial_params | dict | random/heuristic | Initial transition matrix, emission parameters, and start probabilities |
n_restarts | integer | 10 | Number of random restarts to mitigate local optima |
max_iterations | integer | 500 | Maximum EM iterations per restart |
convergence_tol | float | 1e-6 | Log-likelihood convergence threshold for EM |
state_range | list of ints | [n_hidden_states] | Range of state counts for model selection |
covariance_type | string | "full" | For Gaussian emissions: "full", "diagonal", "spherical" |
regularization | float | 1e-6 | Small constant added to diagonal of covariance matrices to prevent singularity |
Steps
Step 1: Define Hidden States and Observation Model
1.1. Set count of hidden states K (or candidate range for model pick in Step 5).
1.2. Pick emission distribution family by data type:
- Continuous data: Gaussian (univariate or multivariate)
- Count data: Poisson or negative binomial
- Categorical data: discrete/multinomial
1.3. Set model bits:
- Transition matrix
Aof sizeK x K:A[i,j] = P(z_t = j | z_{t-1} = i) - Emission params
theta_kfor each statek: distribution-specific (e.g., mean and covariance for Gaussian) - Initial state distribution
pi:pi[k] = P(z_1 = k)
1.4. Check watch data is formatted right: no missing values in sequence, consistent dim, and enough length vs count of params.
Got: Clearly set HMM shape with K states, picked emission family, and clean watch data of length T >> K^2.
If fail: Data has missing values? Fill in or remove affected segments. T too small vs K? Drop K or get more data.
Step 2: Initialize Parameters
2.1. Make initial params for each of n_restarts restarts:
- Transition matrix: Random stochastic matrix (each row from Dirichlet distribution) or slightly perturbed uniform matrix.
- Emission params: Use K-means clustering on observations to init means; compute cluster variances for Gaussian emissions.
- Initial distribution: Uniform or proportional to cluster sizes from K-means.
2.2. For first restart, use K-means-informed init (usually strongest start). For later restarts, use random perturbations.
2.3. Check all initial params are valid:
- Transition matrix rows sum to 1 with all entries positive.
- Emission params in valid domain (e.g., covariance matrices are positive definite).
- Initial distribution sums to 1.
Got: n_restarts sets of valid initial params, with at least one data-driven init.
If fail: K-means fails to converge? Use pure random init with more restarts. Covariance matrices singular? Add regularization constant to diagonal.
Step 3: Run Baum-Welch EM for Parameter Estimation
3.1. E-step (Forward-Backward algorithm):
- Compute forward probs
alpha[t,k]= P(o_1,...,o_t, z_t=k | model) using recursion:alpha[1,k] = pi[k] * b_k(o_1)alpha[t,k] = sum_j(alpha[t-1,j] * A[j,k]) * b_k(o_t)
- Compute backward probs
beta[t,k]= P(o_{t+1},...,o_T | z_t=k, model):beta[T,k] = 1beta[t,k] = sum_j(A[k,j] * b_j(o_{t+1}) * beta[t+1,j])
- Compute state posterior
gamma[t,k]= P(z_t=k | O, model):gamma[t,k] = alpha[t,k] * beta[t,k] / P(O | model)
- Compute transition posterior
xi[t,i,j]= P(z_t=i, z_{t+1}=j | O, model).
3.2. M-step (Param re-estimation):
- Update transition matrix:
A[i,j] = sum_t(xi[t,i,j]) / sum_t(gamma[t,i]) - Update emission params using weighted sufficient stats:
- Gaussian mean:
mu_k = sum_t(gamma[t,k] * o_t) / sum_t(gamma[t,k]) - Gaussian covariance: weighted scatter matrix plus regularization
- Discrete:
b_k(v) = sum_t(gamma[t,k] * I(o_t=v)) / sum_t(gamma[t,k])
- Gaussian mean:
- Update initial distribution:
pi[k] = gamma[1,k]
3.3. Compute log-likelihood: log P(O | model) = log sum_k(alpha[T,k]). Use log-sum-exp trick to block underflow.
3.4. Scaling: Use scaled forward-backward vars to block numerical underflow for long sequences. Normalize alpha at each time step and accumulate log scaling factors.
3.5. Repeat E-step and M-step until log-likelihood change is below convergence_tol or max_iterations hit.
3.6. Across all restarts, keep param set with highest final log-likelihood.
Got: Monotonically non-decreasing log-likelihood across iterations, converging within max_iterations. Final params are valid (stochastic matrices, positive-definite covariances).
If fail: Log-likelihood drops? There is bug in E-step or M-step -- check formulas. Convergence very slow? Try better init or bump max_iterations. Covariance becomes singular? Increase regularization.
Step 4: Apply Viterbi Decoding for Most Likely State Sequence
4.1. Init Viterbi vars:
delta[1,k] = log(pi[k]) + log(b_k(o_1))psi[1,k] = 0(no predecessor)
4.2. Recurse forward for t = 2,...,T:
delta[t,k] = max_j(delta[t-1,j] + log(A[j,k])) + log(b_k(o_t))psi[t,k] = argmax_j(delta[t-1,j] + log(A[j,k]))
4.3. End:
z*_T = argmax_k(delta[T,k])- Best path log-prob:
max_k(delta[T,k])
4.4. Backtrace for t = T-1,...,1:
z*_t = psi[t+1, z*_{t+1}]
4.5. Output decoded state sequence z* = (z*_1, ..., z*_T) and its log-prob.
4.6. Compare Viterbi path prob to total sequence prob from forward algorithm to check how dominant the best path is.
Got: Single most-likely state sequence of length T with each entry in {1,...,K}. Viterbi log-prob should be less than or equal to total log-likelihood.
If fail: Viterbi path has log-prob of negative infinity? Some transition or emission prob is zero where it should not be. Add floor values to block log(0).
Step 5: Perform Model Selection (BIC/AIC Across Model Orders)
5.1. For each candidate count of hidden states K in state_range, fit full HMM (Steps 2-4).
5.2. Compute count of free params p:
- Transition matrix:
K * (K - 1)(each row is simplex) - Emission params: depends on family (e.g., Gaussian with full covariance in
ddimensions:K * (d + d*(d+1)/2)) - Initial distribution:
K - 1
5.3. Compute info criteria:
BIC = -2 * log_likelihood + p * log(T)AIC = -2 * log_likelihood + 2 * pAICc = AIC + 2*p*(p+1) / (T - p - 1)(small-sample correction)
5.4. Pick model with lowest BIC (preferred for consistency) or AIC (preferred for prediction). Report both.
5.5. Tabulate results: for each K, show log-likelihood, count of params, BIC, AIC, convergence status.
5.6. If best K is at edge of state_range, extend range and re-fit.
Got: Clear min in BIC/AIC spotting best count of hidden states. Picked model should have converged and have interpretable state meanings.
If fail: No clear min exists (monotonically decreasing BIC)? Model may be misspec -- think different emission family. All models have poor log-likelihood? Data may not follow HMM structure.
Step 6: Validate with Held-Out Data and Posterior Decoding
6.1. Split data into training and check sets (e.g., 80/20 or use many sequences if open).
6.2. Fit model on training data. Compute log-likelihood on held-out data using forward algorithm (do not re-fit params).
6.3. Posterior decoding (swap for Viterbi):
- For each time step, give state with highest posterior prob:
z^_t = argmax_k(gamma[t,k]) - This maxes expected count of rightly decoded states (vs Viterbi which maxes joint path prob).
6.4. Compare Viterbi and posterior decoding:
- Compute agree rate between the two decoded sequences.
- Regions of disagreement show ambiguous state assignments.
6.5. Check state interpretability:
- Check emission params for each state (means, variances, discrete distributions).
- Confirm states match meaningful regimes in domain context.
- Check state dwell times (implied by diagonal of
A) are reasonable.
6.6. Compute held-out log-likelihood per observation and compare across model orders to confirm training-set model pick.
Got: Held-out log-likelihood is reasonably close to training log-likelihood (no big overfit). Viterbi and posterior decoding agree on 90%+ of time steps. States have distinct, interpretable emission distributions.
If fail: Held-out likelihood much worse than training? Model is overfit -- drop K or bump regularization. States not interpretable? Try different inits or different emission family.
Validation
- Log-likelihood is monotonically non-decreasing across Baum-Welch iterations for each restart
- Transition matrix is row-stochastic (rows sum to 1, all entries non-negative)
- Emission params in valid domain (positive-definite covariances, valid probability distributions)
- Viterbi path log-prob does not exceed total sequence log-prob
- BIC/AIC curves show clear min at picked model order
- Held-out log-likelihood confirms model works beyond training set
- Forward and backward prob computations agree:
P(O) = sum_k(alpha[T,k]) = sum_k(pi[k] * b_k(o_1) * beta[1,k])
Pitfalls
- Local optima in EM: Baum-Welch algorithm converges to local max, not always global. Always use many random restarts and pick best.
- Numerical underflow: Forward-backward probs shrink exponentially with sequence length. Use log-space compute or scaled vars to block underflow to zero.
- Overfit with too many states: Each extra hidden state adds
O(K + d^2)params. Use BIC (not just likelihood) for model pick and check on held-out data. - Label switching: Hidden states identifiable only up to swap. When compare models across restarts, match states by emission params, not by index.
- Degenerate states: State may collapse to explain single observation (Gaussian with near-zero variance). Regularization on covariance matrices blocks this.
- Mix Viterbi and posterior decoding: Viterbi gives single best joint path; posterior decoding gives best marginal state at each time step. They answer different questions and can clash big.
- Ignore state dwell times: Geometric dwell-time distribution built into standard HMMs may be poor fit for data with long regime durations. Think hidden semi-Markov models if dwell times are non-geometric.
See Also
- Model Markov Chain -- pre-req for grasping transition structure that under hidden layer
- Simulate Stochastic Process -- can be used to make synthetic HMM data for testing and to simulate from fitted model for posterior predictive checks
GitHub リポジトリ
関連スキル
content-collections
メタこのスキルは、Content Collections(Markdown/MDXファイルを型安全なデータコレクションに変換するTypeScriptファーストのツール)の本番環境でテストされた設定を提供します。Zodバリデーションによる型安全性を実現し、ブログ、ドキュメントサイト、コンテンツ重視のVite + Reactアプリケーション構築時にご利用ください。Viteプラグインの設定、MDXコンパイルから、デプロイ最適化、スキーマバリデーションまで、すべてを網羅しています。
polymarket
メタこのスキルは、開発者がPolymarket予測市場プラットフォームを活用したアプリケーション構築を可能にします。API統合による取引や市場データの取得に加え、WebSocketを介したリアルタイムデータストリーミングにより、ライブ取引や市場活動を監視できます。取引戦略の実装や、ライブ市場更新を処理するツールの作成にご利用ください。
creating-opencode-plugins
メタこのスキルは、開発者がコマンド、ファイル、LSP操作など25種類以上のイベントタイプにフックするOpenCodeプラグインを作成することを支援します。JavaScript/TypeScriptモジュール向けに、プラグイン構造、イベントAPI仕様、および実装パターンを提供します。カスタムイベント駆動ロジックでOpenCode AIアシスタントのライフサイクルをインターセプト、監視、または拡張する必要がある場合にご利用ください。
sglang
メタSGLangは、高性能なLLMサービングフレームワークであり、RadixAttentionプレフィックスキャッシュを活用したJSON、正規表現、エージェントワークフロー向けの高速で構造化された生成を特長とします。特にプレフィックスが繰り返されるタスクにおいて、大幅に高速な推論を実現し、複雑な構造化出力やマルチターン対話に最適です。制約付きデコードが必要な場合や、広範なプレフィックス共有を伴うアプリケーションを構築する場合は、vLLMなどの代替案ではなくSGLangを選択してください。
