In discussing his recent post about aspirated initial /w/ in Japanese pronunciation of English, Victor Mair asked about the historical phonetics of the strange English spelling 'wh':
I've tried repeatedly to pronounce the H part *after* the W and it seems to be virtually impossible to make such a sequence of sounds. What is it about the evolution of these WH- words in English that has led to this peculiar spelling? Weren't they all Q- words in Latin? Are they WH- words throughout Germanic? What would they have been in Proto-Indo-European?
Let's start by clarifying the nature of /h/, which involves noise created by turbulent flow of air through a small V-shaped opening at the rear of the vocal folds, with the front portion kept closed. In utterance-initial American English /h/, there's generally a short but well-defined voiceless period, and then the larynx is adjusted so that the turbulent flow is replaced by regular oscillation (i.e. "voicing") in the body of the vowel. But this laryngeal maneuver doesn't constrain the rest of the vocal tract — the lips, tongue, velum etc. are free to do whatever. And in an utterance-initial pre-vowel /h/, "whatever" means forming the pattern needed to make the vowel. In other words, the vowel articulation above the larynx is already completely in place before the /h/ noise starts.
Thus the /h/ doesn't really "precede" the vowel, except in the sense that the glottal frication occurs at the beginning of the syllable. Rather, the /h/ is a feature of the way that the vowel starts. You can see this in the two plots below, showing spectrograms of citation-form pronunciations of heel and haul (from the Merriam-Webster online site):
The situation is exactly the same for initial aspiration in a syllable starting with /hw/. There's an period of glottal frication during which the mouth is already in position for the /w/ — essentially, it's a voiceless /w/ with some turbulent-flow noise at the glottis. Here's why from the American Heritage dictionary at dictionary.reference.com (I didn't use the Merriam-Webster pronunciation since it's unaspirated):
So when Victor tried "to pronounce the H part *after* the W", he was trying to produce a voiced labiovelar approximant /w/, and then to stop the voicing and have a period of glottal frication, and then to start voicing again. At best, this requires the complex sequence of starting vocal-cord oscillation, stopping it in favor of glottal frication, and then starting voicing again (all within a tenth of a second or so), rather than just a simple voice onset after 50 or 60 msec of laryngeal turbulence. The biomechanics of voicing would make this rapid start/stop/start sequence a rather hard thing to do, I think; but in any case, it would be a violation of the usual ordering of articulations increasing in sonority from the start of a syllable to its core.
Gene Buckley observed:
The spelling hw in Old English reflects the phonetic reality quite nicely in its ordering, even if it's really a digraph for a single segment — phonetically it resembles a sequence of [h] plus [w], and so this would be a sensible way to spell it. I've long assumed that the re-spelling as wh in early Middle English was motivated by all the other digraphs that have h as their second element, e.g. ch, th, sh, due to Norman French orthographic influence. In other words, it was an orthographic analogy in spite of the better phonetic match in the older spelling.
And Don Ringe explained that
"wh" is a *purely* orthographic convention, and an especially stupid one at that. In Old English they consistently spelled this unit hw, which makes sense. I think it was a consonant cluster in OE, parallel to hr, hl, and hn (which became simple r, l, n in the 12th-13th c.), but in Proto-Germanic it was certainly a coarticulated sound–i.e., the "h" and the "w" were simultaneous– and the PIE labiovelar that it developed out of was also a coarticulated sound. (Opinions are divided about whether Latin "qu" was also a labiovelar or a sequence of /k/+/w/.)
Unfortunately, the phonological distinction between a doubly-articulated consonant and a cluster is not always phonetically plain — most consonant clusters are heavily co-articulated, and things that seem to be clearly single segments on phonotactic grounds (like aspirated stops in English, or /k͡p/ and /ɡ͡b/ in many African languages) nevertheless often have reliably sequenced sub-parts which correspond to things that might be independent segments in another context. This is one of many ways in which the "discrete beads on a string" nature of phonetic symbol sequences is articulatorily and acoustically misleading.
I should add that in the middle of utterances between vowels or other voiced sounds, English /h/ sounds are usually voiced rather than voiceless, i.e. IPA [ɦ] rather than [h]. This involves maintaining rapid quasi-periodic opening and closing in the anterior (front) portion of the vocal folds, while simultaneously creating noise via turbulent flow through a chink in the posterior portion. Again, the rest of the vocal tract is free to take on the configuration appropriate for the following vowel.
This is often true even in careful citation-form pronunciations, for example the online Merriam-Webster performance of inhibit, in which you can see that the noise comes after the release of the [n], and indeed is fully established only after the formant transition to [ɪ] is largely complete: