Archive for Prosody

REAPER

A couple of days ago, I mentioned ("Sarah Koenig", 2/5/2015) that David Talkin was releasing a new pitch tracking program called REAPER (available from github at the link). After a few minor improvements in documentation, it's ready for the general public.

The reaper program uses the EpochTracker class to simultaneously estimate the location of voiced-speech "epochs" or glottal closure instants (GCI), voicing state (voiced or unvoiced) and fundamental frequency (F0 or "pitch"). We define the local (instantaneous) F0 as the inverse of the time between successive GCI.

After trying it out, I can recommend it whole-heartedly — it's robust and accurate and fast. It's my new standard pitch tracker.

Read the rest of this entry »

Comments (5)

Vocal creak and fry, exemplified

There are several different sorts of things involved on the perceptual side of the phenomena that people call "vocal fry" and (less often but more appropriately) "vocal creak".

One perceptual issue is the auditory equivalent of the visual "flicker fusion threshold". If regular impulse-like oscillations in air pressure are fast enough, we hear them as a tone; as they get slower and slower, we can increasingly separate the individual pressure pulses as independent events. The threshold at which the pulses fuse into a tonal percept is called "auditory flutter fusion" or sometimes "auditory flicker fusion". The transition between separation and fusion is a gradual one, and in the boundary region, we can hear the pattern in both ways, sometimes as what is called a "creak" sound, because it sounds like the creaking of a sticky hinge.

The other issue is the perceptual effect of pressure oscillations that are irregular as well as relatively low in frequency. Large amounts of random local variation in period sound like the sound of frying food, as bubbles of steam randomly form and pop here and there.

Both creak and fry can happen in human speech vocal-cord oscillation. But what people generally call "vocal fry" is actually more often mostly "vocal creak".

Read the rest of this entry »

Comments (5)

Jazz Dispute

Just in case you haven't seen this:

[h/t Taylor Jones]

Comments (16)

Phrasal trends in pitch, or, the lab subject's moan

It's been a while since I posted a Breakfast Experiment™ — things have been hectic here — but yesterday in a discussion with some phonetics students, I learned that certain old ideas about (linguistic) intonation have passed out of memory. And in trying to explain these ideas, I posed a problem for myself that is a suitable subject a little hacking during this morning's breakfast hour.  Attention Conservation Notice: We're going to wander in the history-of-phonetics weeds for a while here.

Read the rest of this entry »

Comments off

Combating stereotypes — with stereotypes

Laura Starecheski, "Can Changing How You Sound Help You Find Your Voice?", NPR All Things Considered 10/14/2014:

Just having a feminine voice means you're probably not as capable at your job.  

At least, studies suggest, that's what many people in the United States think.

There's a gender bias in how Americans perceive feminine voices: as insecure, less competent and less trustworthy.  This can be a problem — especially for women jockeying for power in male-dominated fields, like law.

Read the rest of this entry »

Comments (9)

The shape of a spoken phrase in Mandarin

A few years ago, with Jiahong Yuan and Chris Cieri, I took a look at variation in English word duration by phrasal position, using data from the Switchboard conversational-speech corpus ("The shape of a spoken phrase", LLOG 4/12/2006; Jiahong Yuan, Mark Liberman, and Chris Cieri, "Towards an Integrated Understanding of Speaking Rate in Conversation", InterSpeech 2006). As is often the case for simple-minded analysis of large speech datasets, this exercise showed a remarkably consistent pattern of variation — the plot below shows mean duration by position for phrases from 1 to 12 words long:

The Mandarin Broadcast News collection discussed in a recent post ("Consonant effects on F0 in Chinese", 6/12/2014) lends itself to a similar analysis of phrase-position effects on speech timing. So for this morning's Breakfast Experiment™, I ran a couple of scripts to take a first look.

Read the rest of this entry »

Comments (3)

Consonant effects on F0 in Chinese

Following up on two earlier Breakfast Experiments™ ("Consonant effects on F0 of following vowels", 6/5/2014; "Consonant effects on F0 are multiplicative", 6/6/2014), here are some semi-comparable measurements of consonant effects on fundamental frequency (F0) in Mandarin Chinese broadcast news speech.

[As I warned potential readers of those earlier posts, this is considerably more wonkish than most LLOG offerings.]

Why do people care about the effects of consonant features on F0? The main reason is that tonogenesis — the historical development of lexical tones — often arises from re-interpretation of "micromelodies" of this kind, typically driven by laryngeal features of consonants such as voiceless vs. voiced (e.g. p,t,k,s vs. b,d,g,z). So it's natural to wonder whether languages where this has already happened, like Mandarin Chinese, retain or suppress such effects.

Read the rest of this entry »

Comments (3)

Consonant effects on F0 are multiplicative

[Warning: an unusually nerdy follow-up to an unusually nerdy post…] In the comments on yesterday's post "Consonant effects on F0 of following vowels", the question came up whether the effect of consonant voicing on vowel pitch is additive (e.g. plus or minus N Hz) or multiplicative (up or down by M percent). The fact that I calculated the effects in proportional terms indicates that I assumed, without checking, that the effects are multiplicative.

One easy way to check this assumption is to redo the calculations for female vs. male speakers independently, since we expect the overall F0 patterns of female speakers to be about 65-70% higher on average. So for this morning's Breakfast Experiment™ I did just that — it required changing just two characters in the scripts I wrote yesterday, so this was the easiest experiment ever…

Read the rest of this entry »

Comments off

Consonant effects on F0 of following vowels

I spent the past couple of days at a workshop on lexical tone, organized by Kristine Yu at UMass. A topic that came up several times was the question of whether "segmental" influences on pitch — for instance, the fact that voiceless consonants are typically associated with a higher pitch in the first part of a following vowel — might be diminished or even eliminated in languages with lexical tone. Several participants observed that the evidence for this is not very strong: the classical paper on the subject studied a small number of utterances from one speaker in Thai, for example.

So for this morning's Breakfast Experiment™, I wrote a little script that calculates and displays (one way of looking at) these effects in the TIMIT dataset, which includes 10 English sentences spoken by each of 630 speakers. (Specifically, there are two sentences spoken by all 630 speakers;  450 sentences spoken by 7 speakers each; and 1890 sentences spoken by a single speaker.)

I had to go to a meeting before I had a chance to write up the results, but the meeting ended early enough for me to find 15 minutes before lunch, so:

Read the rest of this entry »

Comments (8)

Final rises

As Eric Baković recently noted, there's been a lot of buzz about a presentation about "uptalk" by Amanda Ritchart and Amalia Arvaniti at the 2013 Acoustical Society meeting. All we have so far is a sort of press release  ("Do We All Speak Like Valley Girls? Uptalk in Southern Californian English", ASA Lay Language Papers, 12/5/2013), but this is enough to see that Ritchart and Arvaniti have made a valuable contribution.

They based their analysis on systematic analysis of a good-sized recorded dataset (23 "native speakers of SoCal English", who were asked to describe a muted video clip and to participate in a "map task" interaction). They distinguished among different interactional functions ("simple statement", "question", "floor holding", "confirmation request"), they systematically noted aspects of the location and extent of rises, and they based their conclusions on a statistical analysis of the interrelationship of these features.

Read the rest of this entry »

Comments (7)

Media uptake on uptalk

Yesterday afternoon, UC San Diego Linguistics grad student Amanda Ritchart presented her research (joint with Amalia Arvaniti) on the use and realization of uptalk in Southern California English at the 166th Acoustical Society of America meeting. This work is profiled in the ASA's press room, and has thus far received a fair amount of attention. You can hear and/or read about it on KPBS (San Diego's public radio station), at WBUR's Here & Now, on BBC News, and in the Washington Post. (See also this shout-out on the Linguistic Society of America website.)

Uptalk has been discussed many times here on Language Log, so regular readers are probably not unfamiliar with it. But one of the most recent Language Log posts on the topic ("Uptalk awakening", 9/29/2013) shows how relatively unaware of this long-standing feature of many varieties of English some folks still are. So the media coverage of Ritchart & Arvaniti's work is welcome — and on the whole pretty good, if a little biased toward a "wow, it's spreading to men!" interpretation of the research results, which kinda misses the point. But of course, if you scroll down to the comments (why oh why do I ever scroll down to the comments???), you'll see that many appear to think that the use of rising intonation at the ends of (some!) statements is the clearest evidence we have of the decline of western civilization. Sigh.

Update — more here.

 

Comments (29)

English prosodic phrasing

We can read a 10-digit sequence in the style of an American telephone number, 3+3+4 — e.g. 752-955-0354:

Or we could read the same sequence in a 3+2+3+2 pattern, 752-95-503-54:

It won't surprise you to learn that this changes the pattern of average digit durations:

Read the rest of this entry »

Comments (10)

The message

This year's Penn Reading Project book is Adam Bradley's Book of Rhymes: The Poetics of Hip Hop.  In my discussion group yesterday afternoon, several participants complained that some important things about the "poetics" of rap are lost in a purely textual presentation of the lyrics. One student observed that in pieces he knows, the rhythm is there in the written form — but the lyrics for pieces that he doesn't know seem flat and lifeless in comparison.

There are good reasons that this is more true for the works of Melle Mel or Jay Z than for Elizabeth Barrett Browning or W.H. Auden, I think.

One of the advantages of the weblog format is the combination of text, images, and audio or video clips, so for this morning's Breakfast Experiment™ I decided to present a small exploration of the "poetics of hip hop" in a multimedia — and somewhat quantitative — framework.

This exercise will clarify why transcriptions of the lyrics, even with bold-face indications of stress, are missing an important dimension. The lines' scansion depends not only on the syllable sequence and on where the performer puts phrasal stresses, but also on the alignment of the syllables with the musical meter. This alignment is not automatic or always obvious — it has artistically-relevant degrees of freedom beyond those available in most other genres of text setting.

For those whose appraisal of Bradley's book was (interpreting freely) "not enough vampires and car chases", this will probably make things worse — you have been warned.

Read the rest of this entry »

Comments (20)