Archive for Computational linguistics

Cumulative syllable-scale power spectra

Babies start making speech-like vocalizations long before they start to produce recognizable words — various stages of these sounds are variously described as cries, grunts, coos, goos, yells, growls, squeals, and "reduplicated" or "variegated" babbling. Developmental progress is marked by variable mixtures of variable versions of these noises, and their analysis may provide early evidence of later problems. But acoustic-phonetic analysis of infant vocalizations is hindered by the fact that many sounds  (and sound-sequences) straddle category boundaries. And even for clear instances of "canonical babbling", annotators often disagree on syllable counts, making rate estimation difficult.

In "Towards automated babble metrics" (5/26/2019), I toyed with the idea that an antique work on instrumental phonetics — Potter, Koop and Green's 1947 book Visible Speech — might have suggested a partial solution:

By recording speech in such a way that its energy envelope only is reproduced, it is possible to learn something about the effects of recurrences such as occur in the recital of rimes or poetry. In one form of portrayal, the rectified speech envelope wave is speeded up one hundred times and translated to sound pattern form as if it were an audible note.

Read the rest of this entry »

Comments (4)

Obituary: Petr Sgall (1926-2019)

Professor Emeritus Petr Sgall, professor of Indo-European, Czech studies, and general linguistics at Charles University in Prague, passed away on May 28, 2019 in Prague, the day after his 93rd birthday.

Over a lifetime of distinguished work in theoretical, mathematical and computational linguistics, he did more than any other single person to keep the Prague School linguistic tradition alive and dynamically flourishing. He was the founder of mathematical and computational linguistics in the Czech Republic, and the principal developer of the Praguian theory of Functional Generative Description as a framework for the formal description of language, which has been applied primarily to Czech, but also to English and in typological studies of a range of languages.

Read the rest of this entry »

Comments (5)

Cat names from GTP-2

Janelle Shane, "Once again, a neural net tries to name cats", 6/3/2019:

Last year I trained a neural net to generate new names for kittens, by giving it a list of over 8,000 existing cat names to imitate. Starting from scratch, with zero knowledge of English or any context for the words and letter combinations it was trying out, it tried to predict what letters might be found in cat names, and in which order. Its names ranged from the strange to the completely nonsensical to the highly unfortunate (Retchion, Hurler, and Trickles were some of its suggestions). Without knowledge of English beyond its list of cat names, it didn't know what letter combinations to avoid.

So I decided to revisit the cat-naming problem, this time using a neural net that had a lot more context. GPT-2, trained by OpenAI on a huge chunk of the internet, knows which words and letter combinations tend to be used together on the English-language internet. It also has (mostly) figured out which words and letter combinations to avoid, at least in some contexts (though it does tend to suddenly switch contexts, and then, yikes).

Read the whole thing — with pictures! Apparently the Morris Animal Refuge is using this algorithm to name the animals it offers for adoption.

Read the rest of this entry »

Comments (2)

One law to rule them all?

Power-law distributions seem to be everywhere, and not just in word-counts and whale whistles. Most people know that Vilfredo Pareto  found them in the distribution of wealth, two or three decades before Udny Yule showed that stochastic processes like those in evolution lead to such distributions, and George Kingsley Zipf found his eponymous law in word frequencies. Since then, power-law distributions have been found all over the place — Wikipedia lists

… the sizes of craters on the moon and of solar flares, the foraging pattern of various species, the sizes of activity patterns of neuronal populations, the frequencies of words in most languages, frequencies of family names, the species richness in clades of organisms, the sizes of power outages, criminal charges per convict, volcanic eruptions, human judgements of stimulus intensity …

My personal favorite is the noises it makes when you crumple something up, as discussed by Eric Kramer and Alexander Lobkovsky, "Universal Power Law in the Noise from a Crumpled Elastic Sheet", 1995 ) referenced in "Zipf and the general theory of wrinkling", 11/15/2003).

Contradicting the Central Limit Theorem's implications for what is "normal", power law distributions seem to be everywhere you look.

Or maybe not?

Read the rest of this entry »

Comments (4)

Moby Zipf

Comments (4)

Accidental art

Comments (2)

Syllable-scale wheelbarrow spectrogram

Following up on Saturday's post "Towards automated babble metrics", I thought I'd try the same technique on some adult speech, specifically William Carlos Williams reading his poem "The Red Wheelbarrow".

Why might some approach like this be useful? It's a way of visualizing syllable-scale frequency patterns (roughly 1 to 8 Hz or so) without having to do any phonetic segmentation or classification. And for early infant vocalizations, where speech-like sounds gradually mix in with coos and laughs and grunts and growls and fussing, it might be the basis for some summary statistics that would be useful in tracing a child's developmental trajectory.

Is it actually good for anything? I don't know . The basic idea was presented in a 1947 book as a way to visualize the performance of metered verse. Those experiments didn't really work, and the idea seems to have been abandoned afterwards — though the authors' premise was that verse "beats" should be exactly periodic in time, which was (and is) false.  In contrast, my idea is that the method might let us characterize variously-inexact periodicities.

Read the rest of this entry »

Comments off

Towards automated babble metrics

There are lots of good reasons to want to track the development of infant vocalizations — see e.g. Zwaigenbaum et al. "Clinical assessment and management of toddlers with suspected autism spectrum disorder" (2009). But existing methods are expensive and time-consuming — see e.g. Nyman and Lohmander, "Babbling in children with neurodevelopmental disability and validity of a simplified way of measuring canonical babbling ratio" (2018).  (It's also unfortunately true that there's not yet any available dataset documenting the normal development of infant vocalizations from cooing and gooing to "canonical babbling", but that's another issue…)

People are starting to make and share extensive recordings of infant vocal development — see e.g. Frank et al., "A collaborative approach to infant research: Promoting reproducibility, best practices, and theory‐building" (2017). But automatic detection and classification of vocalization sources and types is still imperfect at best. And if we had reliable detection and classification methods, that would open up a new set of questions: Are the standard categories (e.g. "canonical babbling") really well defined and well separated? Do infant vocalizations of whatever type have measurable properties that would help to characterize and quantify normal or abnormal development?

Read the rest of this entry »

Comments (4)

"Unparalleled accuracy" == "Freud as a scrub woman"

A couple of years ago, in connection with the JSALT2017 summer workshop, I tried several commercial speech-to-text APIs on some clinical recordings, with very poor results. Recently I thought I'd try again, to see how things have progressed. After all, there have been recent claims of "human parity" in various speech-to-text applications, and (for example) Google's Cloud Speech-to-Text tells us that it will "Apply the most advanced deep-learning neural network algorithms to audio for speech recognition with unparalleled accuracy", and that "Cloud Speech-to-Text accuracy improves over time as Google improves the internal speech recognition technology used by Google products."

So I picked one of the better-quality recordings of neuropsychological test sessions that we analyzed during that 2017 workshop, and tried a few segments. Executive summary: general human parity in automatic speech-to-text is still a ways off, at least for inputs like these.

Read the rest of this entry »

Comments (8)


Eliza Strickland, "How IBM Watson Overpromised and Underdelivered on AI Health Care", IEEE Spectrum 4/2/2019 (subhead: "After its triumph on Jeopardy!, IBM's AI seemed poised to revolutionize medicine. Doctors are still waiting"):

In 2014, IBM opened swanky new headquarters for its artificial intelligence division, known as IBM Watson. Inside the glassy tower in lower Manhattan, IBMers can bring prospective clients and visiting journalists into the "immersion room," which resembles a miniature planetarium. There, in the darkened space, visitors sit on swiveling stools while fancy graphics flash around the curved screens covering the walls. It's the closest you can get, IBMers sometimes say, to being inside Watson's electronic brain.

Read the rest of this entry »

Comments (6)

Coherence Quiz answers

As promised, the results of yesterday's little experiment on "Coherence of sentence sequences" are here.

A tabular summary:

 Question Correct Wrong
1 166 (98%) 4 (2%)
2  135 (80%)  33 (20%)
3 167 (99%) 2 (1%)
4 158 (93%) 12 (7%)
5 113 (67%) 56 (33%)
6 152 (90%) 17 (10%)
7 165 (97%) 5 (3%)
8 115 (68%) 55 (32%)
9 169 (99%) 1 (1%)
10 167 (98%) 3 (2%)
11 163 (96%) 7 (4%)
12 137 (81%) 32 (19%)

So the survey respondents (as a whole) guessed the original order of all twelve sentence-pairs correctly — though the margins varied from 2-to-1 to 99-to-1. The overall percent correct was 89%, though of course that percentage will depend on the particular mix of examples.

(The counts don't all sum to the same row-wise value because a couple of participants left some answers blank — there's probably a way to get Qualtrics to prevent that, but I didn't figure it out in time…)

Read the rest of this entry »

Comments (9)

Coherence of sentence sequences

Here are two successive sentences from The Wizard of Oz, presented in two different orders:

  1. "How strange it all is! But, comrades, what shall we do now?"
  2. "We must journey on until we find the road of yellow brick again," said Dorothy, "and then we can keep on to the Emerald City."
  1. "We must journey on until we find the road of yellow brick again," said Dorothy, "and then we can keep on to the Emerald City."
  2. "How strange it all is! But, comrades, what shall we do now?"

The first order (in blue) is easier to construe as a coherent sequence, because in that order, sentence 2 answers a question posed by sentence 1. The version in red could be rescued by a more complicated set of contextual assumptions or a more complicated theory of the interaction — but in fact it's the blue version that's the original.

Read the rest of this entry »

Comments (12)

The first conversing automaton

An article I'm writing led me to wonder when the idea of a conversing automaton first arose, or at least was first published. I'm ruling out magical creations like golems and divine statuary; brazen heads  seem to have either been magical or created using arcane secrets of alchemy; I don't know enough to evaluate the legend of King Mu and Yen Shih's automaton, whose conversational abilities are not clearly described in the texts I've found.

There are many early documented automata doing things like playing music, and plenty of enlightenment philosophizing about what human abilities might or might not be clockwork-like, so I would have thought that there would be plenty of fictional conversing automata over the past four or five hundred years.

But apparently not: it's possible that the first real example was as late as 1907 or even 1938.

Read the rest of this entry »

Comments (21)