Archive for Data bases

The language of spices

Sino-Platonic Papers is pleased to announce the publication of its three-hundred-and-thirty-eighth issue:

Mapping the Language of Spices: A Corpus-Based, Philological Study on the Words of the Spice Domain,” by Gábor Parti.

ABSTRACT

Most of the existing literature on spices is to be found in the areas of gastronomy, botany, and history. This study instead investigates spices on a linguistic level. It aims to be a comprehensive linguistic account of the items of the spice trade. Because of their attractive aroma and medicinal value, at certain points in history these pieces of dried plant matter have been highly desired, and from early on, they were ideal products for trade. Cultural contact and exchange and the introduction of new cultural items beget situations of language contact and linguistic acculturation. In the case of spices, not only do we have a set of items that traveled around the world, but also a set of names. This language domain is very rich in loanwords and Wanderwörter. In addition, it supplies us with myriad cases in which spice names are innovations. Still more interesting is that examples in English, Arabic, and Chinese—languages that represent major powers in the spice trade at different times—are here compared.

Read the rest of this entry »

Comments (10)

Data, information, knowledge, insight, wisdom, and Conspiracy Theory, part 2

From Phillip Remaker:

The one that claimed authorship clipped the edge of the unicorn tail.

 
The only version I have found that doesn't clip the edge of the unicorn tail is this one from farhan
 
I don't know if that means I found the original or if the author touched it up. The page is not archived on the Internet Archive.
 
It seems consistent with his other art.

Read the rest of this entry »

Comments off

Vignettes of quality data impoverishment in the world of PRC AI

Some snippets:

Limited data sets a hurdle as China plays catch-up to ChatGPT

Lack of high-quality Chinese texts on Internet a barrier to training AI models.

Ryan McMorrow, Nian Liu, Eleanor Olcott, and Madhumita Murgia, FT, Ars Technica (2/21/23)

Baidu struggled with its previous attempt at a chatbot, known as Plato, which analysts said could not even answer a simple question such as: “When is Alibaba co-founder Jack Ma’s birthday?”

Analysts point to the lack of high-quality Chinese-language text on the Internet and in other data sets as a barrier for training AI software.

GPT, the program underlying ChatGPT, sucked in hundreds of thousands of English academic papers, news articles, books, and social media posts to learn the patterns that form language. Meanwhile, Baidu’s Ernie has been trained primarily on Chinese-language data as well as English-language data from Wikipedia and Reddit.

Read the rest of this entry »

Comments (11)

Sinitic ideophones

I have always felt that binoms are a key to studying early vernacular Sinitic.  (See "Selected readings" below for useful references on this topic.)  Now we have a valuable research tool for access to and analysis of premodern Sinitic binoms, which fall within the purview of the tabulated listings introduced here:

The Chinese Ideophone Database (CHIDEOD)
L’ ensemble de données des idéophones chinois (CHIDEOD)

In: Cahiers de Linguistique Asie Orientale (Brill)

Authors: Thomas VAN HOEY and Arthur Lewis THOMPSON

Online Publication Date:  26 Oct 2020

Read the rest of this entry »

Comments (1)

Arabic and the vernaculars, part 3

For Arabic diglossia references, see the works of Mohamed Maamouri, e.g., here, here, here, here, here, here, and here (pdf).

Also consult the various Arabic datasets of the LDC (Linguistic Data Consortium), both MSA and colloquial.
 
An important point to make is that the regional Arabic "colloquials" have been developing in separate directions nearly as long as the regional Romance varieties have. So Moroccan Arabic is roughly as different from Gulf Arabic as (say) French is from Portuguese….

Read the rest of this entry »

Comments (7)

Language meets literature; rationality vs. experience; fiction vis-à-vis nonfiction

New article in PNAS (Proceedings of the National Academy of Sciences of the United States of America), "The rise and fall of rationality in language", Marten Scheffer, Ingrid van de Leemput, Els Weinans, and Johan Bollen (12/21/21)

Read the rest of this entry »

Comments (3)