Archive for Computational linguistics

Spectral slices of overtone singing, animated

As part of my on-going exploration of the many ways in which F0 is not pitch and pitch is not F0, I did a little demo/experiment with a sample of Anna-Maria Hefele's "Polyphonic Overtone Singing" video:

Read the rest of this entry »

Comments (15)

Talking is like living

…and ending a sentence is like dying.

What do I mean by this weird and even creepy statement?

Short answer: Your probability of continuing to live is not constant, but decreases exponentially as you get older. (Actuaries know this as the Gompertz-Makeham Law of Mortality,  usually expressed in terms of your probability of dying.)

A generative model of this type, on a shorter time scale, is a surprisingly good fit to the distributions of speech- and silence-segment durations in speech, and also to the distribution of sentence lengths in text. A shockingly good fit, in most cases.

Long answer: See below, if you have the patience…

Read the rest of this entry »

Comments (15)

More on conversational dynamics

Following up on "The dynamics of talk maps" (9/30/2022), I created and parameterized such representations for the published CallHome conversations in Egyptian Arabic, American English, German, Japanese, Mandarin, and Spanish. The goal was mostly just to set up and debug an analysis pipeline, including the extraction of 14 first-guess parameters per conversation, on the way to analyzing the much larger set of much more diverse conversational data that's available.

But just for fun, I used t-SNE to reduce the 14 dimensions to 2 for visualization purposes. I didn't expect much, but some differences emerged in the distribution of points for conversations in the different languages:

Read the rest of this entry »

Comments (3)

The dynamics of talk maps

Over the years, across many disciplines, there have been many approaches to the analysis of conversational dynamics. For glimpses of a few corners of this topic, see the list of related posts at the end of this one — today I want to sketch the beginnings of a new way of thinking about it.

Read the rest of this entry »

Comments (2)

Against physics

Or rather: Against the simplistic interpretation of physics-based abstractions as equal to more complex properties of the physical universe. And narrowing the focus further, it's a big mistake to analyze signals in terms of such abstractions, while pretending that we're analyzing the processes creating those signals, or our perceptions of those signals and processes.  This happens in many ways in many disciplines, but it's especially problematic in speech research.

The subject of today's post is one particular example, namely the use of "Harmonic to Noise Ratio" (HNR) as a measure of hoarseness and such-like aspects of voice quality. Very similar issues arise with all other acoustic measures of speech signals.

I'm not opposed to the use of such measures. I use them myself in research all the time. But there can be serious problems, and it's easy for things to go badly off the rails. For example, HNR  can be strongly affected by background noise, room acoustics, microphone frequency response, microphone placement, and so on. This might just add noise to your data. But if different subject groups are recorded in different places or different ways, you might get serious artefacts.

Read the rest of this entry »

Comments (6)

Our Lady of the Highway: A linguistic mystery

Current text-to-speech systems are pretty good. Their output is almost always comprehensible, and often pretty natural-sounding. But there are still glitches.

This morning, Dick Margulis sent an example of one common problem: inconsistent (and often wrong) stressing of complex nominals:

We have a winding road that we drive with our Google Maps navigator on, to keep us from taking a wrong turn in the woods. We have noticed that "West Woods Road" is rendered with a few different stress patterns as we go from turn to turn, and we can't come up with a hypothesis explaining the variation. Attached is a recording. It's a few minutes long because that's how long the trip takes. The background hum is the car.

I've extracted and concatenated the 11 Google Maps instructions from the four minutes and five seconds of the attached recording:

Read the rest of this entry »

Comments (30)

Micro- Nano-Stylistic Variation

"Don't miss the most loved conference by Delphists like you!"

Philip Taylor wrote to complain about that phrase, which apparently arrived in an email advertisement:

"The most loved conference …" ? I would have written "The conference most loved …".

But his preference apparently disagrees, not only with the author of that flyer, but also with most other writers of English. And it's wonderful how easily we can now check such things. As Yogi Berra (may have) said, "Sometimes you can see a lot just by looking".

Read the rest of this entry »

Comments (26)

When more data makes things worse…

The mantra of machine learning, as Fred Jelinek used to say, is "The best data is more data" — because in many areas, there's a Long Tail of relevant cases that are hard to classify or predict without either a valid theory or enough examples.

But a recent meta-analysis of machine-learning work in digital medicine shows, convincingly, that more data can lead to poorer reported performance.  The paper is  Visar Berisha et al., "Digital medicine and the curse of dimensionality", NPJ digital medicine 2021, and one of the pieces of evidence they present is shown in the figure reproduced below:

This analysis considers two types of models: (1) speech-based models for classifying between a control group and patients with a diagnosis of Alzheimer’s disease (Con vs. AD; blue plot) and (2) speech-based models for classifying between a control group and patients with other forms of cognitive impairment (Con vs. CI; red plot).

Read the rest of this entry »

Comments (8)

Word frequency variation: elicit vs. illicit

In the comments on yesterday's post about a slip of the fingers or brain ("Elicit → illicit"), there was some discussion about which of the two words is more common.

Obviously, the answer to such questions depends on where you look.

So I looked in a bunch of places. Overall, illicit tends to be more common than elicit — but the relative frequency varies widely, and sometimes it's the other way round.

Read the rest of this entry »

Comments (4)


0:11 TING
0:36 SE.

That's the start of the automatically-generated transcript on YouTube for "See George Conway's reaction to Trump's reported plan if he wins again", CNN 7/24/2022.

Read the rest of this entry »

Comments (3)

Conversations with GPT-3

In a recent presentation, I noted that generic statements can be misleading, though it's not easy to avoid the problem:

The limitations and complexities of ordinary language in this area pose difficult problems for scientists, journalists, teachers, and everyone else.

But the problems are especially hard to avoid for AI researchers aiming to turn large text collections into an understanding of the world that the texts discuss.

And to illustrate the point, I used a couple of conversations with GPT-3.

Read the rest of this entry »

Comments (14)

Sentient AI

Read the rest of this entry »

Comments (7)

Comparing phrase lengths in French and English

In a comment on "Trends in French sentence length" (5/26/2022), AntC raised the issue of cross-language differences in word counts: "I was under the impression French needed ~20% more words to express the same idea as an English text." And in response, I promised to "check letter-count and word-count relationships in some English/French parallel text corpora, when I have a few minutes".

I found a few minutes yesterday, and ran (a crude version of) this check on the data in Alex Franz, Shankar Kumar & Thorsten Brants, "1993-2007 United Nations Parallel Text", LDC2013T06.

Read the rest of this entry »

Comments (10)