Archive for Language and computers
April 12, 2022 @ 6:54 am· Filed by Victor Mair under Cryptography, Language and computers, Language and mathematics, Language and philosophy
If you're interested in one-way functions and Kolmogorov complexity, you'll probably want to read this mind-crunching article:
"Researchers Identify ‘Master Problem’ Underlying All Cryptography", by Erica Klarreich, Quanta Magazine (April 6, 2022)
The existence of secure cryptography depends on one of the oldest questions in computational complexity.
To ease our way, here are brief descriptions of the two key terms:
In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems. Not being one-to-one is not considered sufficient for a function to be called one-way….
(source)
In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963.
(source)
Read the rest of this entry »
Permalink
March 14, 2022 @ 11:13 am· Filed by Victor Mair under Computational linguistics, Errors, Etymology, Information technology, Language and computers, Language and psychology, Miswriting, Phonetics and phonology, Psychology of language, Typing
In this age of typing on computers and other digital devices, when we daily input thousands upon thousands of words, we are often amazed at the number and types of mistakes we make. Many of them are simple and straightforward, as when our fingers stumblingly hit the wrong keys by sheer accident. People who type on phones warn their correspondents about the likelihood that their messages are prone to contain such errors because they include some such warning at the bottom:
Please forgive spelling / grammatical errors; typed on glass // sent from my phone.
Read the rest of this entry »
Permalink
March 14, 2022 @ 6:47 am· Filed by Victor Mair under Announcements, Computational linguistics, Decipherment, Language and computers, Writing systems
Since these are on subjects that are of interest to many of us, I'm calling them to your attention.
From Mattia Cartolano:
The INSCRIBE project is hiring!
Two post-doc positions are now available:
- Evolution of Graphic Codes: The Origins of the Chinese Script
- Undeciphered Aegean Scripts: New perspectives in Computational Linguistics
Deadline for applications: Sunday 27 March 2022
If you want to find out more, write to s.ferrara@unibo.it
Read the rest of this entry »
Permalink
March 12, 2022 @ 9:57 pm· Filed by Victor Mair under Dialects, Language and computers, Language and the media, Language teaching and learning, Standard language, Topolects, Writing systems
We've just been through the problems of standard language versus the vernaculars in Arabic (see "Selected readings" below). Now we're going to look at a photograph, a caption, a book review, and a letter to the editor that encompass these contentious issues in spades — but for Chinese. Here's the photograph:
Read the rest of this entry »
Permalink
January 23, 2022 @ 8:51 pm· Filed by Victor Mair under Diglossia and digraphia, Emojis and emoticons, Information technology, Language and computers, Language teaching and learning, Typing, Writing systems
Trying to clear up the confusion between the two is a battle we have been waging for decades, and nowhere is the problem more severe than in the study of Sinitic languages and the Sinographic script. The crisis (not a "danger + opportunity"!) has come to the surface again this month with the appearance of a new book by Jing Tsu titled Kingdom of Characters: The Language Revolution That Made China Modern (Riverhead Books, 2022).
The publication of Tsu's book has generated a lot of excitement, publicity, and reviews. Here I would like to call attention to the brief remarks of an anonymous correspondent (a famous, reclusive linguist) that are right on target:
Reimagining "antiquated" Chinese
Reproduced below is the text of a book review in Science that you may not have seen. It is classified as "Linguistics", though the reviewer is a historian at Cal State Poly, Pomona. Notice that Chinese is assumed to be "antiquated" and in need of being "reimagined"! There is simply no sign of Science understanding the difference between a human language and a writing system. This is consistent with the way they have always treated linguistics; they have no idea what the subject really is.
Read the rest of this entry »
Permalink
November 14, 2021 @ 6:34 pm· Filed by Victor Mair under Artificial intelligence, Censorship, Language and computers
Now it's getting interesting:
"China’s internet police losing man-versus-machine duel on social media"
Stephen Chen, SCMP (11/14/21)
Hordes of bot accounts using clever dodging tactics are causing burnout among human censors, police investigative paper finds
Authorities may respond by raising a counter-army of automated accounts or even an AI-driven public opinion leader
Read the rest of this entry »
Permalink
October 31, 2021 @ 1:26 pm· Filed by Victor Mair under Language and computers, Romanization, Typing
The following is a guest post by Mingfei Lau. A short intro about the author:
My name is Mingfei Lau, a member of The Linguistic Society of Hong Kong Jyutping Workgroup. I am a language engineer at Amazon and I work on different projects on Cantonese resource development in my spare time.
Today, Pinyin is undoubtedly the most popular way to type Mandarin. But what about Cantonese? This wasn’t easy until rime-cantonese, the normalized Cantonese Jyutping[1] lexicon appeared. Lo and behold, you can now type Cantonese in Jyutping just like typing Mandarin in Pinyin.
Read the rest of this entry »
Permalink
October 29, 2021 @ 7:35 pm· Filed by Victor Mair under Language and computers, Language and technology, Writing
News release in EurekAlert, Optica (10/28/21):
"High-speed laser writing method could pack 500 terabytes of data into CD-sized glass disc: Advances make high-density, 5D optical storage practical for long-term data archiving"

Caption
Researchers developed a new fast and energy-efficient laser-writing method for producing nanostructures in silica glass. They used the method to record 6 GB data in a one-inch silica glass sample. The four squares pictured each measure just 8.8 X 8.8 mm. They also used the laser-writing method to write the university logo and mark on the glass.
Credit
Yuhao Lei and Peter G. Kazansky, University of Southampton
Source
Read the rest of this entry »
Permalink
October 29, 2021 @ 7:25 am· Filed by Victor Mair under Artificial intelligence, Language and computers, Language teaching and learning
There may well be a dogma out there stating that all languages are equally complex, but I don't believe it, especially not if it has to be "drummed" into our minds. I have learned many languages. Some of them are exceedingly hard (because of their complexity) and some of them are relatively easy (because they are comparatively simple). I have often said that Mandarin is the easiest language I ever learned to speak, but the hardest to read and write in characters (though very easy in Romanization). And remember these posts:
"Difficult languages and easy languages" (3/4/17)
"Difficult languages and easy languages, part 2" (5/28/19)
Read the rest of this entry »
Permalink
October 27, 2021 @ 9:27 am· Filed by Victor Mair under Artificial intelligence, Language and computers
With this post, we are already acquainted with Inspur's Yuan 1.0, "one of the most advanced deep learning language models that can generate coherent Chinese texts." Now, with the present article, we will delve more deeply into the potentials and pitfalls of Inspur's deep learning language model:
"Inspur unveils GPT-3 equivalent for Chinese language", by Wei Sheng, TechNode (1026/21)
The model is trained with 245.7 billion parameters—the number of weights in an artificial neural network, according to the company. This is more than the Elon Musk-backed GPT-3 language model for English, which has 175 billion parameters. Inspur said the Yuan model was trained with 5 terabytes of datasets.
…
Read the rest of this entry »
Permalink
October 25, 2021 @ 8:45 pm· Filed by Victor Mair under Artificial intelligence, Language and computers, Typing
New article in EnterpriseAI (October 21, 2021):
"Language Model Training Gets Another Player: Inspur AI Research Unveils Yuan 1.0", by Todd R. Weiss
From Pranav Mulgund:
This article introduces an interesting new advance in an artificial intelligence (AI) model for Chinese. As you probably know, Chinese has been long held as one of the hardest languages for AI to crack. Baidu and Google have both been trying for a long time, but have had a lot of difficulty given the complexity of the language. But the company Inspur just came out with a model called Yuan 1.0 that shows significant advances from previous companies' AIs.
Read the rest of this entry »
Permalink
September 1, 2021 @ 9:27 pm· Filed by Victor Mair under Language and computers, Language teaching and learning, Pedagogy
Valerie Hansen is Director of Undergraduate Studies for East Asian Studies at Yale. Yesterday she was talking to a sophomore who had taken 1st and 2nd year Mandarin online and is about to start 3rd year. Valerie writes:
After a while, she told me that she did have one worry about taking 3rd year: she had never written a single character and she wondered if her teacher would expect her to know how to write characters.
She can read Chinese and uses the computer to write essays. So in essence she knows pinyin and can identify the characters she needs when she writes something.
Is this the future of Chinese? Only computers will know characters?
Read the rest of this entry »
Permalink
August 14, 2021 @ 8:22 pm· Filed by Victor Mair under Information technology, Language and computers, Typography
Brian Merriman ran into this article and device when researching electronic typewriters from the 1980s:
Read the rest of this entry »
Permalink