Search Results

English syllable detection

In "Syllables" (2/24/2020), I showed that a very simple algorithm finds syllables surprisingly accurately, at least in good quality recordings like a soon-to-published corpus of Mandarin Chinese. Commenters asked about languages like Berber and Salish, which are very far from the simple onset+nucleus pattern typical of languages like Chinese, and even about English, which has […]

Comments (7)

mmhmm etc.

Kumari Devarajan, "Ready For A Linguistic Controversy? Say 'Mmhmm'", NPR 8/17/20018: Once upon a time, English speakers didn't say "mmhmm." But Africans did, according to Robert Thompson, an art history professor at Yale University who studies Africa's influence on the Americas. In a 2008 documentary, Thompson said the word spread from enslaved Africans into Southern […]

Comments (43)

Slips of the finger vs. slips of the tongue

There's an interesting and understudied way that typing errors and speaking errors are different. From Gary Dell, "Speaking and Misspeaking", Ch. 7 in Introduction to Cognitive Science: Language, 1995: One of the most striking facts about word slips, such as exchanges, anticipations, perseverations, and noncontextual substitutions, is that they obey a syntactic category rule. When […]

Comments (40)

Recording-stable acoustic proxy measures

Behind yesterday's post about possible cultural differences in conversational loudness ("Ask Language Log: Loud Americans?" 11/25/2017), there's a set of serious issues in an area that's too frequently ignored: the philosophy of phonetics. [This is an unusually wonkish post on an eccentric topic — you have been warned.]

Comments (11)

A prosodic difference

In "Political sound and silence II" I noted a large difference in measures of speaking rate across the Weekly Addresses of the past three American presidents:  N Speech (sec.) Silence (sec.) Total (sec.) Mean Duration % Speech Words WPM (overall) WPM (excl. silence) Bush 2008  48  8262  1976  10237 213  0.807  24483  166.9  206.9 Obama […]

Comments (1)

Age, sex, and f0

I've recently been working with Naomi Nevler and others from Penn's Frontotemporal Degeneration Center on quantifying the diverse effects in speech and language of various neurodegenerative conditions. As part of an effort to establish baselines, I turned to the English-language part of the "Fisher" datasets of conversational telephone speech (LDC2004S13, LDC2004T19, LDC2005S13, LDC2005T19), where we have basic demographic […]

Comments (3)

Dr. Dolittle springs eternal

Nicola Davis, "Bat chat: machine learning algorithms provide translations for bat squeaks", The Guardian 12/22/2016 It turns out you don’t need to be Dr Doolittle to eavesdrop on arguments in the animal kingdom.   Researchers studying Egyptian fruit bats say they have found a way to work out who is arguing with whom, what they […]

Comments (8)

Long Johns

From Faith Jones: I recently had the need to buy my elderly mother some long johns as she is finding even our wimpy, West Coast winters hard to take. In a thank you email she refused to call the tops "long johns," as to her that is only for the pants, but didn't know another […]

Comments (42)

Q. Pheevr's Law

In a comment on one of yesterday's posts ("Adjectives and Adverbs"), Q. Pheevr wrote: It's hard to tell with just four speakers to go on, but it looks as if there could be some kind of correlation between the ADV:ADJ ratio and the V:N ratio (as might be expected given that adjectives canonically modify nouns […]

Comments (17)

Adjectives and adverbs

A puzzling note arrived in my inbox a few days ago: I came across an article you wrote about the use of adverbs and adjectives.  To count the use of adverbs and adjectives you actually wrote a program. Is this something you would be willing to share or give me some advice on how to create […]

Comments (6)

More about UM/UH on the Autism Spectrum

At a workshop in June, a group of us will be presenting a report that includes this graph: The x axis is the relative frequency of "filled pauses" UM and UH, from 0% to 8%, and the y axis is the proportion of filled pauses that are UM, from 0% to 100%. The individual plotting […]

Comments (4)

More political text analytics

I spent a few minutes this morning getting transcripts for all 12 Republican and all 9 Democratic debates, and over the next few days I'll do some additional Breakfast Experiments™ on the results. One trivial thing is a complete type-token plot, from texts constructed by concatenating all the transcript pieces attributed to each remaining candidate […]

Comments off

R2D2

Now that there are effectively just two Republican and two Democratic presidential candidates left, I'm starting to get questions about comparing speaking styles across party boundaries. One simple approach is a type-token plot — this is a measure of the rate of vocabulary display, where the horizontal axis is the sequentially increasing number of words ("tokens"), […]

Comments (8)