Spontaneities

« previous post | next post »

Over the years, I've posted several times about the problematic word (and concept) disfluency — there's a too-long list at the end of "Spontaneous (dis)fluency" (8/27/2025). Among other ideas, I've suggested using the term interpolations (see this 2019 post for example). But as far as I can tell, this suggestion has had no impact on other people's usage.

So here's another try: How about making a list of the ways in which fluent spontaneous speech is not like fluent reading, and calling them all spontaneities?

Normal spontaneous speech is full of

  • Filled pauses (uh, um)
  • Filler words (you know, I mean, so, like… )
  • Silent pauses (and pseudo-pauses)
    not in a reading-style relation to message structure
  • Rapid initial repetitions (in- in- in the- the, …)
  • False starts (“that was my= uh the last time”…)
  • Non-speech vocalizations (laughs, sighs, tongue clicks, … )

To enhance readability, transcriptions generally omit most or all of such things. And to study spontaneities, we need easy-to-use notations that can be easily counted. In my own practice, I use

  • __ for silent pauses in talking-typical places
    (e.g. between an article and the following adjective or noun)
    optionally with duration in milliseconds, e.g. _856_
  • final – for rapid initial repetition
  • final = for false starts
  • {laugh}, {sigh}, etc. for non-speech vocalizations,
    or {NSV} if there isn't a standard term

(It would be better to use a structure-oriented rather than string-oriented notation, but that would be much harder to use in typed transcripts…)

When normal spontaneous speech is accurately transcribed, 10-30% of all tokens typically represent such events. You can check the transcriptions in posts like this one or this one, among many others — though in those examples I've omitted the _N_ notations, e.g.

had lot of influence on how I think about- _534_
about history

My belief is that spontaneities are best seen as part of prosody, along with with timing and phrasing, emphasis, voice quality, intonation – all the stuff that is left out of written text, but is a normal and inevitable part of spontaneous speech.

Of course, there are genuine memory failures, like forgetting what you were going to say, or blanking on a word; and genuine “slips of the tongue”, like exchanges, substitutions, anticipations, perseverations of words, syllables, segments, features; and genuine problems at the level of articulation and sound. But I'll argue that most spontaneities are not like that, or at least not entirely like that. And even genuine blanks and slips should not be seen as a deviation from perfect elocution, but as part of the normal process of talking. Furthermore, our perception of spontaneity-full talking is the normal mode of speech perception.

In fact, if one of your friends started talking at you as if they were reading, your reaction would be "Who are you and what have you done with my friend?"

One reason that it's hard to evaluate these claims is that most (or at least too many) linguists and psychologists study only reading, not talking. In particular, almost 100% of empirical studies on speech production and perception use read speech.

Someday, that ought to change.

 



23 Comments »

  1. David Marjanović said,

    October 20, 2025 @ 8:38 am

    "Who are you and what have you done with my friend?"

    I once read a comment by someone who'd done maintenance work on cables in an NSA building. Reportedly, when you ask NSA employees anything, they always pause, and then they always speak "in complete sentences".

  2. Philip Taylor said,

    October 20, 2025 @ 11:14 am

    Which should, surely, be everyone's aim, should it not ?

  3. Haamu said,

    October 20, 2025 @ 1:22 pm

    Not always.

  4. Julian said,

    October 20, 2025 @ 4:25 pm

    @philip Taylor
    Often no, because in many situations talking in complete sentences would brand you as an outsider/pedant/weirdo in ways that you might not want. As per the implications of DM's comment – that's the way the NSA people came across to his informant, whether or not that was their intention.

  5. Mark Liberman said,

    October 20, 2025 @ 4:38 pm

    @Julian "Often no, because in many situations talking in complete sentences would brand you as an outsider/pedant/weirdo in ways that you might not want."

    It's not so much "talking in complete sentences" as it's "talking as if you were fluently reading a prepared text." Spontaneous speech can have plenty of complete sentences full of "spontaneities". And read speech can have plenty of fluently-read noun phrases and so forth.

  6. Philip Taylor said,

    October 20, 2025 @ 4:52 pm

    Well, I am (needless to say) surprised at Haamu's and Julian's responses. I cannot envisage a situation in which I would seek to appear "disfluent" (as opposed to "spontaneous", that is). If, as a result, I might come across as "an outsider/pedant/weirdo", then so be it; better that, surely, than come across as someone who is incapable of formulating an utterance in his own mind before launching (prematurely) into its verbal realisation.

  7. Charles Antaki said,

    October 20, 2025 @ 5:03 pm

    Worth mentioning that over in the conversation analysis literature there's a long tradition of such attempts, going back to the pioneering work of Gail Jefferson (who invented the transcription system from scratch) which tries to capture all of those elements, and more.

    This chapter is a lively account of her thinking ("Why put all of that stuff in? Well, as they say, because it's there").

  8. Haamu said,

    October 20, 2025 @ 5:08 pm

    I certainly wasn't seeking to appear disfluent.

  9. D.O. said,

    October 20, 2025 @ 5:18 pm

    I guess there is also a substantial difference in how people speak with their kith and kin and in more formal situations (but still not requiring to emulate reading of written word). In the "intermediate" situations speaking in full sentences is probably the norm (at least an aspirational norm). Even worse is turn taking, talking over each other etc. I don't see how it can all be analyzed within one approach.

  10. Mai Kuha said,

    October 20, 2025 @ 6:04 pm

    I'm totally on board with this "spontaneities" proposal. My feeling is that missing spontaneities aren't a problem just in terms of speakers appearing pedantic, but that, at least in some cases, information goes missing as well. I could be wrong, but I've long had a hunch that spontaneities get edited out of the PBS NewsHour's "Brief but Spectacular" pieces, and often I feel that I come away understanding only about 80% of how the whole thing hangs together. https://www.youtube.com/watch?v=UA2uHb5t1Oc

  11. Anthony said,

    October 20, 2025 @ 8:34 pm

    When I was a kid, my father was often on the radio. Much later in life I was able to find, online, recordings of him interviewing people. I didn't think it sounded like him at all, but of course it was his "radio voice."

  12. Jonathan Smith said,

    October 20, 2025 @ 9:16 pm

    "Disfluency" is funny as mostly these are exactly the things one uses/learns in order to sound… more fluent. Very noticeable in the best second etc. language learners.

    As for "spontaneity"… thing is this marks spontaneous speech as special/distinct when what is really peculiar is (e.g.) much text.

    Maybe most of the above are simply "discourse markers"? Whether a particular thingy happens to have been snatched from the lexicon proper or not shouldn't be relevant — and surely it's only in the case of "rapid initial repetitions", "false starts" etc. that speakers would acknowledge not having said exactly what they meant?

  13. Julian said,

    October 21, 2025 @ 1:09 am

    @mark Liberman
    Yes I think that's what I really meant.
    I used to work producing the edited transcripts of speeches in parliament and the evidence of parliamentary committees. That includes both prepared speeches and quite a lot of off-the-cut banter (for example answering questions).
    Emphasis on 'editing' – the original first step of taking shorthand and copytyping is now done in an instant by the voice recognition program.
    The task is to correct the mistakes that are introduced by the imperfect voice recognition; to correct the voice recognition's hit and miss punctuation; to remove the disfluencies/spontaneities; to correct the speaker's grammar; and customarily to go a bit further in removing "inelegancies" that are irrelevant to the information content – for example, the speaker repeats a complete correct sentence while gathering their thoughts for the next one.
    To edit 5 minutes of audio in this way usually took 30 to 40 minutes.
    People are often surprised by how long it takes. I then say: "you and I, like almost everyone, make lots of grammatical mistakes [by which I mean to include disfluencies] in our normal conversation. If you saw a truly verbatim transcript of your own casual speech you would probably find it almost unreadable."
    They're often surprised by that too.

  14. Jarek Weckwerth said,

    October 21, 2025 @ 4:28 am

    @Julian for example, the speaker repeats a complete correct sentence while gathering their thoughts for the next one This is a useful thing to notice.

    I think the (other) major fault line is between monologue and dialogue. In monologue, the "spontaneities" proposed by Mark in the OP are mainly the effect of real-time processing and can be seen as disfluencies much more often. In dialogue, there many cases of them serving discourse-structuring purposes such trying to hold the floor. As such, they are more akin to traditionally construed prosody (I dunno, contrastive focus or specific intonation for old vs. new info)

  15. Alvin said,

    October 21, 2025 @ 5:27 am

    This is not directly related to the post content, but for some reason the Atom feed isn't loading correctly in Thunderbird. The validator (https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Flanguagelog.ldc.upenn.edu%2Fnll%2F%3Ffeed%3Datom) says the feed does not validate:

    line 58, column 68: XML parsing error: :58:68: not well-formed (invalid token)

    My belief is that spontaneities are best seen as part of prosody,along w …

  16. Daniel Deutsch said,

    October 21, 2025 @ 5:36 am

    @Mark Liberman
    Can you give a brief example of what you mean by this?:

    “(It would be better to use a structure-oriented rather than string-oriented notation, but that would be much harder to use in typed transcripts…)”

  17. Philip Taylor said,

    October 21, 2025 @ 7:52 am

    I have a feeling, Daniel (nothing more) that Mark is thinking in terms of something like XML markup.

  18. Roscoe said,

    October 21, 2025 @ 8:02 am

    “Reportedly, when you ask NSA employees anything, they always pause, and then they always speak ‘in complete sentences’.”

    Reminds me of Lt. Dangle from “Reno 911” commenting on the visiting FBI agents: “They speak in paragraph form.”

  19. Rodger C said,

    October 21, 2025 @ 9:14 am

    A lot of these supposed disfluencies are rhetorical devices with pretty Greek names. Epanorthosis, for one.

  20. Cervantes said,

    October 21, 2025 @ 9:23 am

    In my own research, I divide transcripts into speech acts. I have a rigorous definition for what constitutes a completed speech act, and a rigorous set of labels for them (e.g. various forms of interrogatives, kinds of assertions, directives, expressives, etc.) Segments of speech which are not completed speech acts may be labeled as false starts, or fillers, and some are meaningless qualifiers. Richard Nixon would often start a response to a question with "Let me say this about that," which literally seems to ask for permission but of course all he's doing is buying time to formulate his response. Note that non-lexical utterances can be speech acts — they are often meaningful.

    People vary considerably in the proportion of their speech which does not consist of completed speech acts. Some people habitually use specific fillers. For example I have a colleague who inserts "sort of" at apparently random places in most sentences, which is not helping his chances for tenure. I was walking past a construction worker once who said, literally, "So I fucking says to the fucking guy, what the fuck are you fucking talking about . . . " Anyway, by parsing transcripts and labeling in this way you can quantify these patterns. You could choose to add additional category labels if it interested you.

  21. Mark Liberman said,

    October 21, 2025 @ 10:27 am

    Interesting comments! I'm traveling and won't be able to respond fully until tomorrow. For now I'll just note that there's indeed a long tradition of attention to "spontaneities" in the conversation analysis literature, and in some other (somewhat siloed) subfields. The surprising thing is that quantitative empirical studies of speech production and perception are nearly 100% focused on read (or at least scripted) speech.

    As for the question about structure-aware transcription, I didn't have in mind the syntax (like xml or json or whatever) but rather the interpretation. For example, false starts and self-corrections have a (often multi-word and perhaps-layered) "reparandum" as well as a maybe-multi-word repair — just marking the end of the reparandum is an efficient way to count the frequency of self-corrections, but it isn't enough.

    And along with the compositional and motoric aspects of prosodic phrasing, there are simultaneously many syntactic, semantic, pragmatic, and conversational aspects. So when we observe that in spontaneous speech there's often (for example) a silent pause between an article and the rest of the following noun phrase, while that almost never happens in reading, we're leaving many communicative as well as compositional issues open.

  22. Julian said,

    October 21, 2025 @ 4:13 pm

    @cervantes on "sort of" and similar fillers.
    I once was listening to a committee witness of that type.If I recall correctly, at one point he uttered the phrase "the unisortaversity." Or something like that. I may be embellishing this in the memory.

  23. Julian said,

    October 21, 2025 @ 5:29 pm

    There's an interesting gradient between false start/abandonment –

    I wonder if I could…
    You see the thing that concerns me is…
    we had evidence from the department that….

    – and repair –

    I wonder if I could
    If you could clarify
    Something you said earlier

    Where in the first case there's a thematic through line ("I'm winding myself up to ask a question" ) but no syntactic connection between the parts; while in the second case there is a syntactic echo.

    [Made up examples that are absolutely typical of a not very articulate parliamentarian, outside the comfort zone of the prepared speech, working up to asking a question]

RSS feed for comments on this post · TrackBack URI

Leave a Comment