 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
HMM states ←
triphones ← words
|
|
|
|
– |
each triphone
→ 3-5 states + connection pattern
|
|
|
|
– |
phone sequence
from pronuncing dictionary
|
|
|
|
– |
clustering for
estimation
|
|
|
• |
Acoustic features
|
|
|
|
– |
RASTA-PLP etc.
|
|
|
|
– |
Vocal tract
length normalization, speaker clustering
|
|
|
• |
Output pdf for
each state as mixture of gaussians
|
|
|
• |
Language model
as N-gram model over words
|
|
|
|
– |
recency/topic
effects
|
|
|
• |
Empirical
weighting of language vs. acoustic models
|
|
• |
etc. etc.
|
|