fit-hidden-markov-model
について
このスキルは、隠れマルコフモデル(HMM)をバウム・ウェルチEMアルゴリズムで学習し、時系列を潜在的なレジーム(例:市場状態や音素)にセグメント化するタスクに対応します。最尤隠れ状態経路のビタビ復号と、系列解析のための前向き・後ろ向き確率を提供します。観測不可能な状態から生じる観測データをモデル化する必要がある場合や、異なる数の隠れ状態を持つモデルを比較する場合にご利用ください。
クイックインストール
Claude Code
推奨npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/fit-hidden-markov-modelこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Fit Hidden Markov Model
Fit HMM via Baum-Welch EM, decode most likely hidden state sequence via Viterbi, select optimal N hidden states via information criteria.
Use When
- Observe sequence emissions but underlying generative states not observable
- Data generated by system switching between finite regimes
- Segment time series into latent phases (market regimes, speech phonemes, biological annotation)
- Compute prob of observed sequence under generative model
- Most likely sequence hidden states given observations (decoding)
- Compare models w/ diff N hidden states → complexity-fit trade-off
In
Required
| Input | Type | Desc |
|---|---|---|
observations | sequence/matrix | Observed data (univariate/multivariate) |
n_hidden_states | integer | N hidden states (or range for selection) |
emission_type | string | "gaussian", "discrete", "poisson", "multinomial" |
Optional
| Input | Type | Default | Desc |
|---|---|---|---|
initial_params | dict | random/heuristic | Init transition matrix, emission params, start probs |
n_restarts | integer | 10 | Random restarts to mitigate local optima |
max_iterations | integer | 500 | Max EM iterations per restart |
convergence_tol | float | 1e-6 | Log-likelihood convergence threshold |
state_range | list of ints | [n_hidden_states] | Range state counts for selection |
covariance_type | string | "full" | Gaussian: "full", "diagonal", "spherical" |
regularization | float | 1e-6 | Diagonal constant preventing singularity |
Do
Step 1: Define Hidden States + Obs Model
1.1. Specify N hidden states K (or candidate range Step 5).
1.2. Emission distribution by data type:
- Continuous: Gaussian (uni/multivariate)
- Count: Poisson or negative binomial
- Categorical: discrete/multinomial
1.3. Components:
- Transition matrix
AsizeK x K:A[i,j] = P(z_t = j | z_{t-1} = i) - Emission params
theta_keachk: distribution-specific (mean + covariance Gaussian) - Initial distribution
pi:pi[k] = P(z_1 = k)
1.4. Verify data: no missing, consistent dim, length T >> K^2.
→ HMM arch w/ K states, chosen emission family, clean data T >> K^2.
If err: missing → impute or remove. T too small → reduce K or get more data.
Step 2: Initialize Params
2.1. Gen initial each of n_restarts:
- Transition: Random stochastic (Dirichlet rows) or perturbed uniform
- Emission: K-means clustering → init means; cluster variances Gaussian
- Initial distribution: Uniform or proportional to cluster sizes
2.2. First restart: K-means-informed (strongest). Subsequent: random perturbations.
2.3. Verify valid:
- Transition rows sum 1, positive
- Emission in valid domain (PD covariance)
- Initial sums 1
→ n_restarts sets of valid params, ≥1 data-driven.
If err: K-means fails → purely random w/ more restarts. Singular covariance → add regularization to diagonal.
Step 3: Baum-Welch EM
3.1. E-step (Forward-Backward):
- Forward
alpha[t,k]= P(o_1,...,o_t, z_t=k | model):alpha[1,k] = pi[k] * b_k(o_1)alpha[t,k] = sum_j(alpha[t-1,j] * A[j,k]) * b_k(o_t)
- Backward
beta[t,k]= P(o_{t+1},...,o_T | z_t=k, model):beta[T,k] = 1beta[t,k] = sum_j(A[k,j] * b_j(o_{t+1}) * beta[t+1,j])
- State posterior
gamma[t,k]= P(z_t=k | O, model):gamma[t,k] = alpha[t,k] * beta[t,k] / P(O | model)
- Transition posterior
xi[t,i,j]= P(z_t=i, z_{t+1}=j | O, model).
3.2. M-step (re-estimate):
- Transition:
A[i,j] = sum_t(xi[t,i,j]) / sum_t(gamma[t,i]) - Emission weighted sufficient stats:
- Gaussian mean:
mu_k = sum_t(gamma[t,k] * o_t) / sum_t(gamma[t,k]) - Gaussian covariance: weighted scatter matrix + regularization
- Discrete:
b_k(v) = sum_t(gamma[t,k] * I(o_t=v)) / sum_t(gamma[t,k])
- Gaussian mean:
- Initial:
pi[k] = gamma[1,k]
3.3. Log-likelihood: log P(O | model) = log sum_k(alpha[T,k]). Log-sum-exp → prevent underflow.
3.4. Scaling: Scaled forward-backward → prevent underflow long sequences. Normalize alpha each step + accumulate log scaling factors.
3.5. Repeat E + M until log-likelihood change < convergence_tol or max_iterations.
3.6. Across restarts → keep params w/ highest final log-likelihood.
→ Monotonically non-decreasing log-likelihood, converge w/in max. Final valid (stochastic matrices, PD covariances).
If err: log-likelihood decreases → bug E/M, verify formulas. Very slow → better init or increase max. Singular covariance → increase regularization.
Step 4: Viterbi Decoding
4.1. Init:
delta[1,k] = log(pi[k]) + log(b_k(o_1))psi[1,k] = 0(no predecessor)
4.2. Recurse t = 2,...,T:
delta[t,k] = max_j(delta[t-1,j] + log(A[j,k])) + log(b_k(o_t))psi[t,k] = argmax_j(delta[t-1,j] + log(A[j,k]))
4.3. Terminate:
z*_T = argmax_k(delta[T,k])- Best path log-prob:
max_k(delta[T,k])
4.4. Backtrace t = T-1,...,1:
z*_t = psi[t+1, z*_{t+1}]
4.5. Output decoded sequence z* = (z*_1, ..., z*_T) + log-prob.
4.6. Compare Viterbi path prob to total sequence prob from forward → dominance.
→ Single most-likely sequence length T, each in {1,...,K}. Viterbi log-prob ≤ total log-likelihood.
If err: Viterbi -inf log-prob → transition/emission prob zero where shouldn't. Add floor values preventing log(0).
Step 5: Model Selection (BIC/AIC)
5.1. Each candidate K in state_range → fit full HMM (Steps 2-4).
5.2. Free params p:
- Transition:
K * (K - 1)(rows simplex) - Emission: family-dependent (Gaussian full covariance
ddim:K * (d + d*(d+1)/2)) - Initial:
K - 1
5.3. Information criteria:
BIC = -2 * log_likelihood + p * log(T)AIC = -2 * log_likelihood + 2 * pAICc = AIC + 2*p*(p+1) / (T - p - 1)(small-sample)
5.4. Select lowest BIC (consistency) or AIC (prediction). Report both.
5.5. Tabulate each K: log-likelihood, # params, BIC, AIC, convergence.
5.6. Optimal K at boundary → extend range + re-fit.
→ Clear min BIC/AIC → optimal N hidden states. Selected converged + interpretable.
If err: no clear min (monotonically decreasing BIC) → misspecified, try diff emission family. All poor log-likelihood → data may not follow HMM structure.
Step 6: Validate Held-Out + Posterior
6.1. Split training/validation (80/20 or multiple sequences).
6.2. Fit training. Compute held-out log-likelihood via forward (no re-fit).
6.3. Posterior decoding (alt to Viterbi):
- Each step → state w/ highest posterior:
z^_t = argmax_k(gamma[t,k]) - Maximizes expected # correctly decoded (vs Viterbi maximizing joint path).
6.4. Compare Viterbi + posterior:
- Agreement rate between sequences
- Disagreement regions → ambiguous assignments
6.5. State interpretability:
- Examine emission params each state (means, variances, discrete)
- Verify states correspond meaningful regimes in domain
- Dwell times (diagonal
A) reasonable
6.6. Held-out log-likelihood per observation + compare across orders → confirm training selection.
→ Held-out reasonably close to training (no severe overfit). Viterbi + posterior agree 90%+. States distinct + interpretable.
If err: held-out much worse than training → overfit, reduce K or increase regularization. States not interpretable → diff init or emission family.
Check
- Log-likelihood monotonically non-decreasing Baum-Welch each restart
- Transition row-stochastic (rows sum 1, non-negative)
- Emission in valid domain (PD covariances, valid prob distributions)
- Viterbi log-prob ≤ total log-prob
- BIC/AIC clear min at selected order
- Held-out confirms generalization
- Forward + backward agree:
P(O) = sum_k(alpha[T,k]) = sum_k(pi[k] * b_k(o_1) * beta[1,k])
Traps
- Local optima EM: Baum-Welch → local max not global. Always multiple restarts + pick best.
- Numerical underflow: Forward-backward probs shrink exponential w/ length. Log-space or scaled variables.
- Overfit too many states: Each adds
O(K + d^2)params. Use BIC not likelihood + validate held-out. - Label switching: States identifiable only up to permutation. Compare across restarts → match by emission params not index.
- Degenerate states: State collapses to explain single observation (Gaussian near-zero variance). Regularization prevents.
- Confuse Viterbi + posterior: Viterbi = single best joint path; posterior = best marginal state each step. Different questions, can disagree significantly.
- Ignore dwell times: Geometric dwell-time in standard HMM may be poor fit for long regime durations. Consider hidden semi-Markov if non-geometric.
→
- Model Markov Chain — prereq for transition structure underlying hidden layer
- Simulate Stochastic Process — gen synthetic HMM data + simulate fitted for posterior predictive
GitHub リポジトリ
関連スキル
content-collections
メタこのスキルは、Content Collections(Markdown/MDXファイルを型安全なデータコレクションに変換するTypeScriptファーストのツール)の本番環境でテストされた設定を提供します。Zodバリデーションによる型安全性を実現し、ブログ、ドキュメントサイト、コンテンツ重視のVite + Reactアプリケーション構築時にご利用ください。Viteプラグインの設定、MDXコンパイルから、デプロイ最適化、スキーマバリデーションまで、すべてを網羅しています。
polymarket
メタこのスキルは、開発者がPolymarket予測市場プラットフォームを活用したアプリケーション構築を可能にします。API統合による取引や市場データの取得に加え、WebSocketを介したリアルタイムデータストリーミングにより、ライブ取引や市場活動を監視できます。取引戦略の実装や、ライブ市場更新を処理するツールの作成にご利用ください。
creating-opencode-plugins
メタこのスキルは、開発者がコマンド、ファイル、LSP操作など25種類以上のイベントタイプにフックするOpenCodeプラグインを作成することを支援します。JavaScript/TypeScriptモジュール向けに、プラグイン構造、イベントAPI仕様、および実装パターンを提供します。カスタムイベント駆動ロジックでOpenCode AIアシスタントのライフサイクルをインターセプト、監視、または拡張する必要がある場合にご利用ください。
sglang
メタSGLangは、高性能なLLMサービングフレームワークであり、RadixAttentionプレフィックスキャッシュを活用したJSON、正規表現、エージェントワークフロー向けの高速で構造化された生成を特長とします。特にプレフィックスが繰り返されるタスクにおいて、大幅に高速な推論を実現し、複雑な構造化出力やマルチターン対話に最適です。制約付きデコードが必要な場合や、広範なプレフィックス共有を伴うアプリケーションを構築する場合は、vLLMなどの代替案ではなくSGLangを選択してください。
