Quirky speech-to-text, weird diarization

From Daniel Deutsch:

We had a long drive yesterday, so we listened to a “robot” reading the entire indictment. It certainly isn’t flawless, but I was surprised by how good it is, especially when it gets “excited” while enacting dialogue.

Indeed, the text-to-speech quality is quite good — though unfortunately they don't tell us which TTS software they used.

Here's the opening, which is indeed entirely clear and even nearly natural-sounding:

Read the rest of this entry »

Comments (2)


Apostrophes in Hanyu Pinyin

The most famous instance of the use of an apostrophe in Hanyu Pinyin romanization is in the place name "Xi'an", the capital of Shaanxi (the doubled "a" is another story) Province.

Xī'ān 西安 — two characters signifying "Western Peace"

If you don't use an apostrophe to separate the syllables, you end up with the monosyllable "xian", which — depending upon the tone and the character it is meant to represent — could mean dozens of different things.

Mark Swofford has carried out an interesting investigation on "Mandarin words with more than one apostrophe", Pinyin News (6/11/23).

Read the rest of this entry »

Comments (3)


ChatGPT has a sense of humor (sort of)

Benj Edwards has a mirthful article in Ars Technica (6/9/23)

Researchers discover that ChatGPT prefers repeating 25 jokes over and over

When tested, "Over 90% of 1,008 generated jokes were the same 25 jokes."

[includes an AI generated image of "a laughing robot"]

On Wednesday, two German researchers, Sophie Jentzsch and Kristian Kersting, released a paper that examines the ability of OpenAI's ChatGPT-3.5 to understand and generate humor. In particular, they discovered that ChatGPT's knowledge of jokes is fairly limited: During a test run, 90 percent of 1,008 generations were the same 25 jokes, leading them to conclude that the responses were likely learned and memorized during the AI model's training rather than being newly generated.

The two researchers, associated with the Institute for Software Technology, German Aerospace Center (DLR), and Technical University Darmstadt, explored the nuances of humor found within ChatGPT's 3.5 version (not the newer GPT-4 version) through a series of experiments focusing on joke generation, explanation, and detection. They conducted these experiments by prompting ChatGPT without having access to the model's inner workings or data set.

[Jentzsch and Kersting] listed the top 25 most frequently generated jokes in order of occurrence. Below, we've listed the top 10 with the exact number of occurrences (among the 1,008 generations) in parenthesis:

Read the rest of this entry »

Comments (12)


"Steak the First"

Enlightening article by Peter Backhaus in The Japan Times (6/9/23):

"Za grammar notes: How to properly handle the 'the' in Japanese"

Japanese seems to be able to assimilate any English word, including the ubiquitous definite article "the", which is unlike anything in Japanese itself.

If there’s something like a Murphy’s Law for syntax, the name of this restaurant near my school is a pretty good example of it. Reading “Steak The First,” it always makes me wonder how these three words came to be aligned in just that order. “The first steak,” “first the steak,” “the steak first” — all of these seem safe for consumption. But “steak the first”?

In order to understand what’s going on here, we need to appreciate the very specific way the little word “the” is used in Japanese, where it is normally pronounced ザ (za). Note that the reading may change to ジ (ji) when the following word starts with a vowel, as in the name of the invincible Japanese rock band The Alfee, which officially reads ジ・アルフィー (ji arufī).

But since Japanese is a language that normally gets along perfectly well without articles, it’s a bit challenging to understand what use it can make of ザ in the first place. Even more puzzling is that, more often than not, ザ shows up in places where English syntax wouldn’t want you to put an article at all.

Read the rest of this entry »

Comments (12)


"The beautiful mind paper boxes"

The most recent Trump indictment reproduces this exchange of text messages (p. 11) :

Trump Employee 2:

We can definitely make it work if we move his
papers into the lake room?

Trump Employee 1:

There is still a little room in the shower where his
other stuff is. Is it only his papers he cares about?
Theres some other stuff in there that are not papers.
Could that go to storage? Or does he want everything
in there on property

Trump Employee 2:

Yes – anything that's not the beautiful mind paper
boxes can definitely go to storage. Want to take a
look at the space and start moving tomorrow AM?

Read the rest of this entry »

Comments (16)


InternLM

As I am about to deliver a keynote address to an international conference on Chinese language pedagogy, I receive news of this new LLM that knocks my socks off:

InternLM is a multilingual large language model jointly developed by Shanghai AI Lab and SenseTime (with equal contribution), in collaboration with the Chinese University of Hong Kong, Fudan University, and Shanghai Jiaotong University.

Technical report: [PDF]

Note: Please right click the link above to directly download the PDF file.

Abstract

We present InternLM, a multilingual foundational language model with 104B parameters. InternLM is pre-trained on a large corpora with 1.6T tokens with a multi-phase progressive process, and then fine-tuned to align with human preferences. We also developed a training system called Uniscale-LLM for efficient large language model training. The evaluation on a number of benchmarks shows that InternLM achieves state-of-the-art performance in multiple aspects, including knowledge understanding, reading comprehension, mathematics, and coding. With such well-rounded capabilities, InternLM achieves outstanding performances on comprehensive exams, including MMLU, AGIEval, C-Eval and GAOKAO-Bench, without resorting to external tools. On these benchmarks, InternLM not only significantly outperforms open-source models, but also obtains superior performance compared to ChatGPT. Also, InternLM demonstrates excellent capability of understanding Chinese language and Chinese culture, which makes it a suitable foundation model to support Chinese-oriented language applications. This manuscript gives a detailed study of our results, with benchmarks and examples across a diverse set of knowledge domains and tasks.

Read the rest of this entry »

Comments (1)


ChatGPT does Emily Dickinson writing a recipe for Pad Thai (and haiku too)

From Scott D. Seligman via Facebook:

  ChatGPT is really creeping me out. I asked it for a recipe for Pad Thai in the form of an Emily Dickinson poem. I'm no poetry maven, but the damned thing seems to have the ability to turn a phrase, at least some of the time.

Below is what I got in response. [Note to Jeanne Larsen, Jenny Shepherd and any other poets or poetesses with
whom I am acquainted: I hear Starbucks may be hiring baristas].

Read the rest of this entry »

Comments (16)


Self-owning peeve of the week: Compersion

Email from Florent Moncomble [links added]:

A few months ago, the distinguished member of the Académie française Alain Finkielkraut was featured in a video where he deplored the loss of “a word which used to exist in the [French] language and disappeared from it”, ie. “compersion”. Apparently, little does he know that “compersion” was actually coined in the 1970s by the Kerista Community of San Francisco, in the context of polyamory, to describe the joy felt in knowing that your better half finds pleasure and happiness with other sexual partners! So that, far from being the old French word that he thinks it is, it is actually an English borrowing from the late 20th century… in other terms, the very nemesis of the Académie — not to mention the moral overtones of the term, quite the antithesis of the conservatism of that institution…

Laelia Veron, a colleague from the Université d’Orléans, Christophe Benzitoun from the Université de Lorraine and I worked together on debunking Finkielkraut’s claim for an academically informed yet humorous biweekly spot that Laelia has on French public radio France Inter.

Read the rest of this entry »

Comments (12)


Transliterations aplenty

From Simon Cartoon:

Here's something I just saw at a local bakery in Berkeley, CA.

Read the rest of this entry »

Comments (20)


What is the logical form of that?

This post wanders down a series of rabbit holes, from a couple of dead economists, to a dead philosopher, to a dead Supreme Court justice. It all started with Eric Rahim's obituary in the Guardian, which links to the British Academy's obituary for Piero Sraffa, which includes this passage:

He also formed a close friendship with the Austrian-born philosopher Ludwig Wittgenstein, the founder of linguistic philosophy, who was a Fellow of Trinity College and later became Professor of Philosophy at the University. They met regularly on afternoon walks and engaged in endless discussions during the time that Wittgenstein prepared his second book entitled The Nature of Philosophical Investigations, in which he considerably modified his original position put forward in his first book, the Tractatus Logico-Philosophicus. In the introduction to the later work Wittgenstein paid the most generous tribute to Sraffa's unceasing interest in philosophical problems and to his capacity and readiness to engage in endless discussions. He stated in the Introduction to his second book (translated from the German original) that 'it was this stimulus to which I owe the most momentous ideas of this book' (italics in the original).1

1It was a question of Sraffa's which convinced him that language and reality do not necessarily have a common logical form.

Read the rest of this entry »

Comments (6)


Greco-Sinitic ψάμμος / ʃˠa mɑk̚ ("desert")

[This is a guest post by Chau Wu]

The psammo- component of the winning word in this year's Scripps National Spelling Bee, psammophile, is of interest to me because it is a good example of European-Sinitic lexical correspondence. The Ancient Greek word psámmos (ψάμμος) means ‘sand’.  When used together with a definite article (ἡ ψάμμος), it also means ‘the sandy desert’. Examples can be found in Herodotus: ‘the sandy desert’ of Libya (4.173), Ethiopia (3.25), and Egypt (3.26). In Sinitic, ‘sandy desert’ is 沙漠 (MSM shāmò / Tw soa-bô·). From psammos to shāmò, it is easy to see three processes of simplification that may have taken place to transform the Greek loan: simplification of the initial cluster ps- > s-, that of the medial -mm- > -m-, and the loss of the final -s. The simplification of ps- > s- is also seen in Greek derived English words such as psyche, pseudo-, and psalm.

Read the rest of this entry »

Comments (26)


Unknown language #15

Yuan (?) dynasty (1271-1368) jade seal in the Bristol Museum:

Read the rest of this entry »

Comments (8)


Sinitic exclamations in English speech

Listen to Malaysian comedian Nigel Ng (aka "Uncle Roger"), who has had his Weibo and bilibili social media accounts banned due to "violation of relevant regulations":

Read the rest of this entry »

Comments (1)