The 2016 Blizzard Challenge

The Blizzard Challenge needs you!

Every year since 2005, an ad hoc group of speech technology researchers has held a "Blizzard Challenge", under the aegis of the Speech Synthesis Special Interest Group (SYNSIG) of the International Speech Communication Association.

The general idea is simple:  Competitors take a released speech database, build a synthetic voice from the data and synthesize a prescribed set of test sentences. The sentences from each synthesizer are then evaluated through listening tests.

Why "Blizzard"? Because the early competitions used the CMU ARCTIC datasets, which began with a set of sentences read from James Oliver Curwood's novel Flower of the North.

Anyhow, if you have an hour of your time to donate towards making speech synthesis better, sign up and be a listener!

Comments (2)


Writing Sinitic languages with phonetic scripts

This morning I was awakened by a bird calling outside my window, "m*ll*n*y m*l*rk*y", or maybe it was some squirrel chattering (I was half asleep and couldn't be sure which it was).  Since I was unable to distinguish the vowels clearly, I couldn't tell exactly what the call / chatter was, but the bird / squirrel kept repeating it over and over, so at least I was able to transcribe the general lineaments: "m*ll*n*y m*l*rk*y m*ll*n*y m*l*rk*y m*ll*n*y m*l*rk*y".

Read the rest of this entry »

Comments (24)


Needless words

I know I've been a long-time critic of everything in The Elements of Style, not least William Strunk's platitude that you should omit needless words. "Needless" is not defined even vaguely; nobody really writes in a way that sticks to the absolute minimum word count; and if neophyte writers could tell what was needless they wouldn't have to be handed this platitude (which they don't really know how to use anyway). But every now and then one really does see a case of a word that screams at you that it should have been left out. The University of Oxford has an official form on which this is the heading:

CLAIM FOR REIMBURSEMENT OF ALLOWABLE EXPENSES

Read the rest of this entry »

Comments (105)


Political TV Ad Archive

The Political TV Ad Archive:

The Political TV Ad Archive is a project of the Internet Archive. This site provides a searchable, viewable, and shareable online archive of 2016 political TV ads, married with fact-checking and reporting citizens can trust.  Political TV ad spending is expected to be in the billions. Yet the same local stations that air the ads provide very little solid reporting on politics. Even fewer correct political misinformation. In partnership with trusted journalistic organizations, the new Political TV Ad Archive provides a free service for journalists, civic organizations, academics and the general public to track these ads in context.  The project is open source and available on github: this site and the Duplitron.

For an introduction to the Political TV Ad Archive and how to use it, check out this video.

As of March 23, 2016, the Political TV Ad Archive is wrapping up the first phase of the project, where we tracked 20 markets in nine key primary states. The project will continue to track ads playing in the New York, Philadelphia, and San Francisco television market areas. Project staff are gathering lessons learned, which will inform planning and fundraising for the second phase of the project: tracking political ads in key 2016 general election battleground states.

Read the rest of this entry »

Comments (4)


Wikipedia article length

For various reasons I recently downloaded snapshots of Wikipedia in various languages, and I'd like to share with you some discoveries, starting with article length in the English Wikipedia.

Read the rest of this entry »

Comments (27)


Too like the gender

Is this the future of English pronouns? Ada Palmer's Too Like the Lightning takes place in a world where he/she is as quaintly obsolete as thee/thou. From the book's opening:

You will criticize me, reader, for writing in a style six hundred years removed from the events I describe, but you came to me for explanation of those days of transformation which left your world the world it is, and since it was the philosophy of the Eighteenth Century, heavy with optimism and ambition, whose abrupt revival birthed the recent revolution, so it is only in the language of the Enlightenment, rich with opinion and sentiment, that those days can be described. You must forgive me my ‘thee’s and ‘thou’s and ‘he’s and ‘she’s, my lack of modern words and modern objectivity. It will be hard at first, but whether you are my contemporary still awed by the new order, or an historian gazing back at my Twenty-Fifth Century as remotely as I gaze back on the Eighteenth, you will find yourself more fluent in the language of the past than you imagined; we all are.

Read the rest of this entry »

Comments (29)


Q. Pheevr's Law

In a comment on one of yesterday's posts ("Adjectives and Adverbs"), Q. Pheevr wrote:

It's hard to tell with just four speakers to go on, but it looks as if there could be some kind of correlation between the ADV:ADJ ratio and the V:N ratio (as might be expected given that adjectives canonically modify nouns and adverbs canonically modify verbs). Of course, there are all sorts of other factors that could come into this, but to the extent that speakers are choosing between alternatives like "caused prices to increase dramatically" and "caused a dramatic increase in prices," I'd expect some sort of connection between these two ratios.

So since I have a relatively efficient POS tagging script, and an ad hoc collection of texts lying around, I thought I'd devote this morning's Breakfast Experiment™ to checking the idea out.

Read the rest of this entry »

Comments (17)


Trump's nickname for me

…is "Tardy Mark", at least according to one roll of the dice by The Daily Show's Trump Nickname Generator:

Read the rest of this entry »

Comments (35)


Backward Thinking about Orientalism and Chinese Characters

 This is a guest post by David Moser of Beijing Capital Normal University

For those of us who teach and research the Chinese language, it is often difficult to describe how the Chinese characters function in conveying meaning and sound, and it’s always a particular challenge to explain how the writing system differs from the alphabetic systems we are more familiar with. The issues are complex and multi-layered, and have important implications for basic literacy and the teaching of Chinese to both native speakers and foreign learners. Tom Mullaney, a professor of history at Stanford University, has lately been muddying these pedagogical waters in a series of articles and interviews that seriously misrepresent the merits and relative advantages of the alphabet over the Chinese script.

Read the rest of this entry »

Comments (81)


But what did they feed them?

Comments (15)


Adjectives and adverbs

A puzzling note arrived in my inbox a few days ago:

I came across an article you wrote about the use of adverbs and adjectives.  To count the use of adverbs and adjectives you actually wrote a program. Is this something you would be willing to share or give me some advice on how to create myself? I am looking for a tool that our marketing team can use to keep the puffery to a minimum.

It was puzzling because the cited article was  "Stop Hating on Adjectives and Adverbs", Slate 9/10/2013.  And as the title suggests, my attitude towards eliminating adjective and adverbs was a skeptical one:

Calculating the relative percentages of adjectives and adverbs in texts tells us nothing useful about their readability, clarity, or efficiency.

Read the rest of this entry »

Comments (6)


Singlish: alive and well

We've mentioned that special brand of Singaporean English on Language Log from time to time, most recently just a few days ago:

"New Singaporean and Hong Kong terms in the OED" (5/12/16)

So what is it, really?

Read the rest of this entry »

Comments (18)


Two dozen, two thousand, whatever

For Times Insider, David W. Dunlap has an article about some of the more entertaining errors and corrections that have graced the pages of The New York Times: "The Times Regrets the Error. Readers Don't."

Among the goofs is this one from a Q&A with Ivana Trump that appeared in the Oct. 15, 2000 New York Times Magazine:

Read the rest of this entry »

Comments (15)