1


2

 Task: available signals → model of the world around
 signals are mostly accidental, inadequate
 sometimes disguised or falsified
 always mixedup and ambiguous
 Reasoning about the source of signals:
 Integration of context: what do you expect?
 “Sensor fusion”: integration of vision, sound, smell etc.
 Source (and noise) separation:
there’s more than one thing out there
 Variable perspective, source variation etc.
 depends on the type of signal
 depends on the type of object
 Much harder than chess or calculus!

3

 Thomas Bayes (17021761)
 Minister of the Presbyterian Chapel
at Tunbridge Wells
 Amateur mathematician
 Essay towards solving a problem
in the doctrine of chances,
published (posthumously) in 1764
 Crucial idea:
 background (prior) knowledge
about the plausibility of different theories
can be combined with knowledge about
the relation of theories to evidence
 in a mathematically welldefined way
 even if all knowledge is
uncertain
 to reason about the most likely explanation of the available evidence
 Bayes’ theorem
 “the most important equation in the history of mathematics” (?)
 a simple consequence of basic definitions, or
 a stillcontroversial recipe for the probability of alternative causes
for a given event, or
 the implicit foundation of human reasoning
 a general framework for solving the problems of perception

4


5


6


7


8

 P(WS) ∝ P(SW)P(W)
 where W is “Word(s)” (i.e. message text)
 S is “Sound(s)” (i.e.
speech signal)
 “Noisy channel model” of communications engineering
due to Shannon 1949
 New algorithms, especially relevant to speech recognition
 due to L.E. Baum et al. ~
19651970
 Applied to speech recognition by Jim Baker (CMU PhD 1975),
 Fred Jelinek (IBM speech
group >>1975)

9

 A consistent framework for integrating
previous experience and
current evidence
 A quantitative model for “abduction”
= reasoning about
the best explanation
 A general method
for turning a generative model into an analytic one
= “analysis by
synthesis”
helpful where
categories << signals

10

 1. Bayes’ Rule: P(WS) ∝
P(SW)P(W)
 2. Approximate P(SW)P(W)
as a Hidden Markov Model
 a probabilistic function [ to get P(SW)]
 of a markov chain [
to get P(W) ]
 3. Use Baum/Welch (=EM) algorithm
to “learn” HMM
parameters
 4. Use Viterbi decoding
 to find the most probable W
given S
 in terms of the estimated
HMM

11


12


13


14

 HMM states ← triphones ← words
 each triphone → 35 states + connection pattern
 phone sequence from pronuncing dictionary
 clustering for estimation
 Acoustic features
 RASTAPLP etc.
 Vocal tract length normalization, speaker clustering
 Output pdf for each state as mixture of gaussians
 Language model as Ngram model over words
 Empirical weighting of language vs. acoustic models
 etc. etc.

15

 Problems with Markovian assumptions
 Modeling trajectory effects
 Variable coordination of articulatory dimensions
 ....
