Siri in Korea

"The bizarre political scandal that just led to the impeachment of South Korea's president" (Jennifer Williams, Vox, 3/9/17)


Protestors wearing masks of South Korean President Park Geun-Hye (R) and her confidante Choi Soon-Sil (L) pose for a performance during a rally denouncing a scandal over President Park's aide in Seoul on October 27, 2016. JUNG YEON-JE/AFP/Getty Images

Read the rest of this entry »

Comments (5)


What a woman can't do with their body

Mark Meckes noticed a tweet about an interview with Emma Watson, who was being discussed in this Language Log post, and mentioned it in a comment thereto. It was completely off topic (and thus violated the Language Log comments policy), but I felt it was too interesting to be left languishing down there in a comment on a post about preposition doubling, so I'm repeating it here, where it can have its own post:

If you think @EmmaWatson is a hypocrite, maybe consider you shouldn't be telling a woman what they can and can't do with their own body.

Two occurrences of singular they (they and their), with the phrase a woman as antecedent!

Read the rest of this entry »

Comments off


Synesthesia and Chinese characters

Leo Fransella asks:

I'm curious to know whether, in your years studying and teaching written Chinese, you've ever come across synaesthesia as applied to Chinese characters (zi) or words (ci)?

The most common form of synaesthesia (~1% of people, I think) involves the systematic assignment of colours to letters, numbers or (sometimes) whole words. I have this 'grapheme-colour' quite strongly: when I hear a phone number or see a number written on a page, for example, I automatically sense it as bands of colour. Much the same for words: it literally bothers me when I don't know how to spell someone's name, as their associated colours can be so different (Catherine is bluey-green with a dash of red; Kathryn is green-yellow). Sounds a bit loopy to people who don't do this, but it's a very useful mnemonic trick when learning French vocab or Latin verb conjugations and noun declensions.

Read the rest of this entry »

Comments (15)


What's hot at ICASSP

This week I'm at IEEE ICASSP 2017 in New Orleans — that's the "Institute of Electrical and Electronics Engineers International Conference on Acoustics, Speech and Signal Processing". pronounced /aɪ 'trɪ.pl i 'aɪ.kæsp/. I've had joint papers at all the ICASSP conferences since 2010, though I'm not sure that I've attended all of them.

This year the conference distributed its proceedings on a nifty little guitar-shaped USB key, which I promptly copied to my laptop for easier access. I seem to have deleted my local copies of most of the previous proceedings, but ICASSP 2014 escaped the reaper, so I decided to while away the time during one of the many parallel sessions here by running all the .pdfs (1703 in 2014, 1316 this year) through pdftotext, removing the REFERENCE sections, tokenizing the result, removing (some of the) unwordlike strings, and creating overall lexical histograms for comparison. The result is about 5 million words for 2014 and about 3.9 million words this year.

And to compare the lists, I used the usual "weighted log-odds-ratio, informative Dirichlet prior" method, as described for example in "The most Trumpish (and Bushish) words", 9/5/2015.

Read the rest of this entry »

Comments (2)


No Japanese, South Koreans, or dogs

Here we go again.  Image trending on WeChat, a sign on a Beijing bus:

Read the rest of this entry »

Comments (35)


Involuntary immigrants

Below is a guest post by Larry Horn, based on a note submitted to the American Dialect Society's mailing list. The topic is the the slaves-as-immigrants flap occasioned by Ben Carson’s reference in his recent remarks characterizing slaves as immigrants who worked particularly hard for particularly low wages.

Read the rest of this entry »

Comments (35)


Hate

There are multilingual signs all over Swarthmore (where I live) that say "Hate Has No Home Here".  The signs are printed in six languages:  English, Urdu, Hebrew, Korean, Arabic, and Spanish.  I wondered about the choice of languages, but — with a little googling — I found that these are apparently the languages most commonly spoken at Petersen Elementary School in the North Park neighborhood of Chicago, where the campaign to post these signs originated.  It's interesting that the linguistic mix of an elementary school in Chicago determined the multilingualism of signs that are being posted all over the country.

Incidentally, there is also a #LoveThyNeighbor (No Exceptions) campaign going on, and here I wondered about the archaism of the "Thy".  It seems to me that the King Jamesian language of these signs conveys clear Christian overtones, which may account for the fact that there are far fewer of these signs around than the HHNHH signs.

"Hate" is also a hot topic in China these days.

Read the rest of this entry »

Comments (30)


Mistakes

Yesterday's post "A stick with which to beat other women with" discussed the duplication of prepositions in the title phrase, and a commenter complained that

The woman interviewed has a pretty mediocre command of English (she doesn't pronounce a single coherent sentence and keeps stuttering) although she is an actress speaking in her native language. That she would make mistakes in her own language is thus regrettable but not especially surprising. I am not unaware that the concept "mistake" does not enjoy stellar prestige among linguists, but why is that particular error worthy of a blog entry?

As another commenter observed, my original post used the phrase "performance error" to describe the possibility that Emma Watson's preposition doubling was a mistake rather than a bona fide syntactic variant.

But my point today is that verbatim transcripts of spontaneous speech are often full of filled pauses, self-corrections, and other things that must be edited out in order to create what that commenter would count as a "coherent sentence". And this is true even for people who have risen far in the world on the basis of their ability to impress others in spontaneous verbal interaction.

Read the rest of this entry »

Comments (21)


A stick with which to beat other women with

There have been dozens of articles in the news recently about Emma Watson's Vanity Fair photo shoot, the reaction to it, and her reaction to the reaction. For example, Cherry Wilson, "Is Emma Watson anti-feminist for exposing her breasts?", BBC News 3/6/2017; or Jessica Samakow, "26 Tweets Prove #WhatFeministsWear Is ‘Anything They F*cking Want’", Huffington Post 3/6/2017; or Travis Andrews, "‘Feminism is not a stick with which to beat other women’: Emma Watson tells off critics of revealing photo", Washington Post 3/6/2017.

What's the linguistic angle? Well, the quote in that WaPo headline is not exactly what she said.

Read the rest of this entry »

Comments (30)


Two tons of creamed corn

Today's xkcd:

Mouseover title: "Sure, you could just ask, but this also takes care of the host gift thing."

Unless of course they have Google Home. In which case apparently the thing to do is to ask about the communist coup that Obama is planning…

Comments (11)


Topolectal traffic sign

This has apparently been around for awhile, but I'm seeing it now for the first time:

Read the rest of this entry »

Comments (14)


The shape of a LibriVox phrase

Here's what you get if you align 11 million words of English-language audiobooks with the associated texts, divide it all into phrases by breaking at silent pauses greater than 150 milliseconds, and average the word durations by position in phrases of lengths from one word to fifteen words:

The audiobook sample in this case comes from LibriSpeech (see Vassil Panayotov et al., "Librispeech: An ASR corpus based on public domain audio books", IEEE ICASSP 2015). Neville Ryant and I have been collecting and analyzing a variety of large-scale speech datasets (see e.g. "Large-scale analysis of Spanish /s/-lenition using audiobooks", ICA 2016; "Automatic Analysis of Phonetic Speech Style Dimensions", Interspeech 2016), and as part of that process, we've refactored and realigned the LibriSpeech sample, resulting in 5,832 English-language audiobook chapters from 2,484 readers, comprising 11,152,378 words of text and about 1,571 hours of audio. (This is a small percentage of the English-language data available from LibriVox, which is somewhere north of 50,000 hours of English audiobook at present.)

Read the rest of this entry »

Comments (8)


Difficult languages and easy languages

People often ask me questions like these:

What's the easiest / hardest language you ever learned?

Isn't Chinese really difficult?

Which is harder, Chinese or Japanese?  Sanskrit or German?

Without a moment's hesitation, I always reply that Mandarin is the easiest spoken language I have learned and that Chinese is the most difficult written language I have learned.  I learned to speak Mandarin fluently within about a year, but I've been studying written Chinese for half a century and it's still an enormous challenge.  I'm sure that I'll never master it even if I live to be as old as Zhou Youguang.

Read the rest of this entry »

Comments (151)