A two-syllable "Mom" is commoner, with the parts supporting a downstepped sequence of distinct pitch targets. And of course the same intonation can be used with any other vocative. The extra syllable(s) only arise if the vocative doesn't have at least two syllables to start with: thus "Mo-ther!" can bear exactly the same pattern, without the need to subdivide any syllables.
I've often wondered whether this particular intonational gesture — here expressing exasperation — is the same in all English varieties, including those (for example) where the unmarked intonational pattern is rising rather than falling. (For background discussion, see "Uptalk anxiety", 9/7/2008; "The phonetics of uptalk", 9/13/2008; "Uptalk vs. UNBI again", 11/23/2008.) In Glasgow and Belfast, for example, do daughters produce an exasperated "Mo-ther!" on two rising pitch levels?
And what about other languages, including pitch-accent languages like Japanese, or various types of languages with lexical tone?
Another interesting question is what distinguishes this intonational gesture from the similarly stylized two-step fall used to call to people who are not already in contact (the "vocative chant"). The two gestures are pragmatically and emotionally very different — and are likely to be associated with different facial expressions, voice qualities and so on — but the pitch contours involved seem at least to come from overlapping distributions.
[Update -- if commenters will send me audio clips of the patterns that they write about, I'll post them in an accessible form. Any audio format is OK: .wav, .aiff, .mp3, .aav, etc.]