Fundamental theorem
of speech recognition
P(W|S) P(S|W)P(W)
  where W is “Word(s)”  (i.e. message text)
             S is “Sound(s)” (i.e. speech signal)
“Noisy channel model” of communications engineering
due to Shannon 1949
New algorithms, especially relevant to speech recognition
   due to L.E. Baum et al. ~ 1965-1970
Applied to speech recognition by Jim Baker (CMU PhD 1975),
     Fred Jelinek (IBM speech group >>1975)