Data, information, knowledge, insight, wisdom, and Conspiracy Theory, part 2
From Phillip Remaker:
Read the rest of this entry »
Permalink Comments off
From Phillip Remaker:
Read the rest of this entry »
Permalink Comments off
Some snippets:
Limited data sets a hurdle as China plays catch-up to ChatGPT
Lack of high-quality Chinese texts on Internet a barrier to training AI models.
Ryan McMorrow, Nian Liu, Eleanor Olcott, and Madhumita Murgia, FT, Ars Technica (2/21/23)
…
Baidu struggled with its previous attempt at a chatbot, known as Plato, which analysts said could not even answer a simple question such as: “When is Alibaba co-founder Jack Ma’s birthday?”
Analysts point to the lack of high-quality Chinese-language text on the Internet and in other data sets as a barrier for training AI software.
GPT, the program underlying ChatGPT, sucked in hundreds of thousands of English academic papers, news articles, books, and social media posts to learn the patterns that form language. Meanwhile, Baidu’s Ernie has been trained primarily on Chinese-language data as well as English-language data from Wikipedia and Reddit.
…
Read the rest of this entry »
I have always felt that binoms are a key to studying early vernacular Sinitic. (See "Selected readings" below for useful references on this topic.) Now we have a valuable research tool for access to and analysis of premodern Sinitic binoms, which fall within the purview of the tabulated listings introduced here:
The Chinese Ideophone Database (CHIDEOD)
L’ ensemble de données des idéophones chinois (CHIDEOD)
In: Cahiers de Linguistique Asie Orientale (Brill)
Authors: Thomas VAN HOEY and Arthur Lewis THOMPSON
Online Publication Date: 26 Oct 2020
Read the rest of this entry »
For Arabic diglossia references, see the works of Mohamed Maamouri, e.g., here, here, here, here, here, here, and here (pdf).
Read the rest of this entry »
New article in PNAS (Proceedings of the National Academy of Sciences of the United States of America), "The rise and fall of rationality in language",
Read the rest of this entry »