The paucity of curse words in Japanese

In "Ichiro Suzuki Uncensored, en Español:  Between the Lines, Japanese Star Is Known as a First-Class Spanish Trash Talker", via Andy Cheung, the Yankees outfielder is quoted thus:  "…we don't really have curse words in Japanese, so I like the fact that the Western languages allow me to say things that I otherwise can't."

Read the rest of this entry »

Comments (72)


The paucity of two-letter words

The number of possible two-letter lower-case strings over the English alphabet (not including the apostrophe) is 262 = 676. This morning I ran a script to test which two-letter sequences show up as words included in the standard 25,143-word list of words supplied with many Unix-derived systems (usually at /usr/share/dict/words). I found the proportion of two-letter sequences that are 2-letter words is roughly 9 percent (59/676 ≈ 0.09). That is, more than 90 percent of the logically possible two-letter combinations from aa to zz do not occur as spellings of common English words. You might think a lot of the explanation lies in phonetics: vowelless combinations like pq or bn are unpronounceable. But I then did the same thing for two-letter standard Unix commands: bc (basic calculator), cp (copy files), ls (list files), mv (move or rename files), etc. These arbitrarily adopted program names do not have to be pronounceable, and usually aren't. And I found that the ratio of two-letter Unix commands (more precisely, two-letter commands that have manual entries on Apple OS X version 10.6.8.) to two-letter sequences that are not Unix commands is almost exactly the same (62/676 ≈ 0.09). Why? Could it be that some kind of natural law discourages packing too many meanings into character strings (or phoneme sequences) of a given length, because it is likely to give rise to confusion or mnemonic problems? Does every language waste (as it were) at least 90 percent of the space available in the length-N sequences of letters or sounds that it uses, possibly for every N > 1?

Read the rest of this entry »

Comments (42)


Technicality Club

Comments (12)


Metaphors and the brain: check it out

"Your Brain on Metaphors", at the The Chronicle of Higher Education's site, is interesting non-technical reading for anyone interested in the idea of experimentation on metaphors, idioms, and the way the brain processes them. I recommend reading the whole thing.

Comments off


Text analytics applied to applications of things like text analytics

South by Southwest (SXSW) uses a web-based voting method to choose panels, and so Jason Baldridge took a look at the titles submitted for Phil Resnik's "Putting a Real-Time Face on Polling" session,  to

… see whether some straight-forward Unix commands, text analytics and natural language processing can reveal anything interesting about them.

He describes the results in "Titillating Titles: Uncoding SXSW Proposal Titles with Text Analytics", 9/2/2014.

 

Comments (1)


Can you spell "bus"?

I have commented before on the psycholinguistics of signs painted on roads: in the USA it is apparently assumed that drivers will read the words in the order in which their front wheels reach them, so that what appears to be a display with "ONLY" above "LANE" above "BIKE" is supposed to be read as "BIKE LANE ONLY". In the UK, the opposite assumption is made: that drivers will read the whole display as a text that starts at the top. However, in one startling recent case in Bristol, south-west England, the people who painted the sign on the road warning of a bus stop never read it at all, in either order. They just stencilled "BUP STOP" on the roadway and packed up and left. Photographic evidence supplied herewith, just in case you cannot believe anyone capable of holding down a local government job could be unable to spell "BUS".

Comments off


Somebody

Yesterday I was skimming the digital New York Times and clicked on the second-from-the-right item in the panel below, without noticing the "paid post" superscript:

This took me to an article about a new smartphone app called Somebody:

Here’s how Somebody works: when you send your friend or loved one a message through the app, it doesn’t go directly to them, but uses GPS to locate the Somebody user nearest to him or her. This person (probably a stranger) delivers the message verbally, acting as your stand-in.

Read the rest of this entry »

Comments (15)


More on tonal variation in Sinitic

In a number of posts, we have discussed departure from stipulated tonal configurations in speech, e.g.:

"Dissimilation, stress, sandhi, and other tonal variations in Mandarin "

"When intonation overrides tone"

"Where did Chinese tones come from and where are they going?"

In this post, we will focus on the wide variation of tone in names for some family relationships.

Read the rest of this entry »

Comments (13)


Nth Xest

In the course of writing about the "fourth highest of five levels", I looked around at how the pattern "Nth Xest" is used in general. I found that uses of such expressions overwhelmingly count from the "top" where X names a top-oriented scale (high, big, long, etc.), and count from the "bottom" where X names a bottom-oriented scale (low, small, short, etc.)  In other words, unsurprisingly, "Nth Xest" normally counts (up or down) from whatever end of the scale "Xest" names.

Another (less logically necessary but still unsurprising) thing I noticed is that top-oriented counts are always a lot bigger than corresponding bottom-oriented counts, and that counts decrease almost-proportionately as N increases. Thus from Google Books ngrams:

second third fourth fifth sixth
highest 34447 9692 3148 1411 784
lowest 6006 1455 491 293 138

Read the rest of this entry »

Comments (1)


Poetic contrastive focus reduplication

Comments (35)


Fourth highest, less empty

We culturally-evolved plains apes often have problems dealing with scalar predicates, flipping direction even when negation isn't involved. Here's the UK "terror threat level" scale:

On Friday, the British government raised the level from "substantial" to "severe".  Several news outlets described this as "the fourth highest" level — thus Laura Smith-Spark, Andrew Carey and Greg Botelhom, "UK raises terror threat level, citing risks out of Syria, Iraq", CNN 8/30/2014:

The UK government raised its terror threat level Friday from "substantial" to "severe," the fourth highest of five levels, in response to events in Iraq and Syria, where ISIS militants have seized a large swath of territory.

Read the rest of this entry »

Comments (31)


Is Hello Kitty not a cat?

There's been a to-do over whether Hello Kitty is a cat or a human, a massive uproar of tweets and retweets:

Some folks believe that the confusion over whether Hello Kitty is a feline or a human may be based on the misapplication or mistranslation of the term gijinka 擬人化. See "Hello Kitty isn’t a cat!? We called Sanrio to find out!" (Rocket News 24, 8/28/14).

Read the rest of this entry »

Comments (46)


Too close for comfort

Today's Zits:

Comments (7)