Archive for Data bases

A new look at sperm whale communication

For as long as I can remember, I've been aware that whales, dolphins, porpoises, and other large mammals of the seas (the cetaceans) make whistles, clicks, calls, groans, songs, and other sounds / noises.  These vocalizations are manifestly complex and nuanced, leading people to believe that they are communicating content, emotions, and so forth.  What exactly they are conveying and how they do it have remained a mystery, but researchers never stop trying to figure out cetacean "language".  A new study at MIT claims to have made progress in analyzing sperm whale sound systems.

Scientists document remarkable sperm whale 'phonetic alphabet'
By Will Dunham, Reuters (May 7, 2024)
[with 2:58 video]

I was hesitant to read this article at all because of the mention of a "phonetic alphabet".  Even with the quotation marks around it, attributing this ability to sperm whales was a bit much for me.

Yet, since it was "scientists" doing the documenting, I forced myself to read the first two paragraphs:

The various species of whales inhabiting Earth's oceans employ different types of vocalizations to communicate. Sperm whales, the largest of the toothed whales, communicate using bursts of clicking noises – called codas – sounding a bit like Morse code.
A new analysis of years of vocalizations by sperm whales in the eastern Caribbean has found that their system of communication is more sophisticated than previously known, exhibiting a complex internal structure replete with a "phonetic alphabet." The researchers identified similarities to aspects of other animal communication systems – and even human language.

Read the rest of this entry »

Comments (7)

The language of spices

Sino-Platonic Papers is pleased to announce the publication of its three-hundred-and-thirty-eighth issue:

Mapping the Language of Spices: A Corpus-Based, Philological Study on the Words of the Spice Domain,” by Gábor Parti.


Most of the existing literature on spices is to be found in the areas of gastronomy, botany, and history. This study instead investigates spices on a linguistic level. It aims to be a comprehensive linguistic account of the items of the spice trade. Because of their attractive aroma and medicinal value, at certain points in history these pieces of dried plant matter have been highly desired, and from early on, they were ideal products for trade. Cultural contact and exchange and the introduction of new cultural items beget situations of language contact and linguistic acculturation. In the case of spices, not only do we have a set of items that traveled around the world, but also a set of names. This language domain is very rich in loanwords and Wanderwörter. In addition, it supplies us with myriad cases in which spice names are innovations. Still more interesting is that examples in English, Arabic, and Chinese—languages that represent major powers in the spice trade at different times—are here compared.

Read the rest of this entry »

Comments (10)

Data, information, knowledge, insight, wisdom, and Conspiracy Theory, part 2

From Phillip Remaker:

The one that claimed authorship clipped the edge of the unicorn tail.

The only version I have found that doesn't clip the edge of the unicorn tail is this one from farhan
I don't know if that means I found the original or if the author touched it up. The page is not archived on the Internet Archive.
It seems consistent with his other art.

Read the rest of this entry »

Comments off

Vignettes of quality data impoverishment in the world of PRC AI

Some snippets:

Limited data sets a hurdle as China plays catch-up to ChatGPT

Lack of high-quality Chinese texts on Internet a barrier to training AI models.

Ryan McMorrow, Nian Liu, Eleanor Olcott, and Madhumita Murgia, FT, Ars Technica (2/21/23)

Baidu struggled with its previous attempt at a chatbot, known as Plato, which analysts said could not even answer a simple question such as: “When is Alibaba co-founder Jack Ma’s birthday?”

Analysts point to the lack of high-quality Chinese-language text on the Internet and in other data sets as a barrier for training AI software.

GPT, the program underlying ChatGPT, sucked in hundreds of thousands of English academic papers, news articles, books, and social media posts to learn the patterns that form language. Meanwhile, Baidu’s Ernie has been trained primarily on Chinese-language data as well as English-language data from Wikipedia and Reddit.

Read the rest of this entry »

Comments (11)

Sinitic ideophones

I have always felt that binoms are a key to studying early vernacular Sinitic.  (See "Selected readings" below for useful references on this topic.)  Now we have a valuable research tool for access to and analysis of premodern Sinitic binoms, which fall within the purview of the tabulated listings introduced here:

The Chinese Ideophone Database (CHIDEOD)
L’ ensemble de données des idéophones chinois (CHIDEOD)

In: Cahiers de Linguistique Asie Orientale (Brill)

Authors: Thomas VAN HOEY and Arthur Lewis THOMPSON

Online Publication Date:  26 Oct 2020

Read the rest of this entry »

Comments (1)

Arabic and the vernaculars, part 3

For Arabic diglossia references, see the works of Mohamed Maamouri, e.g., here, here, here, here, here, here, and here (pdf).

Also consult the various Arabic datasets of the LDC (Linguistic Data Consortium), both MSA and colloquial.
An important point to make is that the regional Arabic "colloquials" have been developing in separate directions nearly as long as the regional Romance varieties have. So Moroccan Arabic is roughly as different from Gulf Arabic as (say) French is from Portuguese….

Read the rest of this entry »

Comments (7)

Language meets literature; rationality vs. experience; fiction vis-à-vis nonfiction

New article in PNAS (Proceedings of the National Academy of Sciences of the United States of America), "The rise and fall of rationality in language", Marten Scheffer, Ingrid van de Leemput, Els Weinans, and Johan Bollen (12/21/21)

Read the rest of this entry »

Comments (3)