Archive for Computational linguistics

Parsing puzzle of the week

"Short Wave: A Physics Legend", NPR Up First 4/3/2022 [emphasis added]:

In the 1950's, a particle physicist made a landmark discovery that changed what we thought we knew about how our universe operates. Chien-Shiung Wu did it while raising a family and an ocean away from her relatives in China. In this episode from NPR's daily science podcast Short Wave, we delve into the life and impact of Chien-Shiung Wu, widely considered the "queen of nuclear physics."

Read the rest of this entry »

Comments (17)

The weirdness of typing errors

In this age of typing on computers and other digital devices, when we daily input thousands upon thousands of words, we are often amazed at the number and types of mistakes we make.  Many of them are simple and straightforward, as when our fingers stumblingly hit the wrong keys by sheer accident.  People who type on phones warn their correspondents about the likelihood that their messages are prone to contain such errors because they include some such warning at the bottom: 

Please forgive spelling / grammatical errors; typed on glass // sent from my phone.

Read the rest of this entry »

Comments (37)

Postdocs on ancient scripts: Chinese and Aegean

Since these are on subjects that are of interest to many of us, I'm calling them to your attention.

From Mattia Cartolano:

The INSCRIBE project is hiring!

Two post-doc positions are now available:

  1. Evolution of Graphic Codes: The Origins of the Chinese Script
  2. Undeciphered Aegean Scripts: New perspectives in Computational Linguistics

Deadline for applications: Sunday 27 March 2022
If you want to find out more, write to s.ferrara@unibo.it

Read the rest of this entry »

Comments off

Turing Complete

Today's xkcd:

The mouseover title: "Thanks to the ForcedEntry exploit, your company's entire tech stack can now be hosted out of a PDF you texted to someone."

Read the rest of this entry »

Comments (13)

Who is brian?

Email from a colleague in computer science, listing some of the mistranscriptions in the Zoom captions of his office hours:

timing problem -> tiling problem
bulletin annually -> boolean formulae
satisfy your ability -> satisfiability
fire patterns -> tile patterns
inquisition -> position
valuables -> variables
double fines -> double prime
double poison -> ?
amen -> m, n
wine is in the continent of age -> ???
I do not want a diet climb to brian -> ???

I will stop here and I hope that you can all satisfy your ability with no double fines
and avoid inquisition.
Who is brian?

Read the rest of this entry »

Comments (8)

Robotic anaerobic Rodak erotic rotisserie

In yesterday's "Lively Blind Men" post, Ben Zimmer was appropriately amused by Zoom's speech-to-text mis-recognition of Lila Gleitman's name. But as everyone now has opportunities to learn, speech-to-text systems continue to make strange (and often amusing) mistakes in transcribing words and phrases that they haven't been trained to recognize. There are plenty of examples in pretty much any automatic transcription, and the 10/26 edition of the "Spectacular Vernacular podcast", which Ben co-hosts with Nicole Holliday, doesn't disappoint.

Read the rest of this entry »

Comments (6)

Lively Blind Men

Last weekend, there was a memorial service at Penn for  Lila Gleitman, who passed away in August. The hundreds of people physically present were joined by a large crowd on Zoom, where the automatic closed captioning was turned on. And so the audience got to see a large sample of speech-to-text versions of Lila's name, of which this was my favorite:

(Click the picture for a larger version with more context…)

Read the rest of this entry »

Comments (2)

English as Afrikaans?

Language-identification from digital text has been a solved problem for many years, so I was surprised yesterday to see Gmail offering to translate from Afrikaans an email written in perfectly idiomatic English, which started this way:

Read the rest of this entry »

Comments (10)

Deep fake audio

Helen Rosner, "A Haunting New Documentary About Anthony Bourdain", The New Yorker 7/15/2021:

It’s been three years since Anthony Bourdain died, by suicide, in June of 2018, and the void he left is still a void. […]

In 2019, about a year after Bourdain’s death, the documentary filmmaker Morgan Neville began talking to people who had been close to Bourdain: his family, his friends, the producers and crew of his television series. “These were the hardest interviews I’ve ever done, hands down,” he told me. “I was the grief counsellor, who showed up to talk to everybody.” […]

There is a moment at the end of the film’s second act when the artist David Choe, a friend of Bourdain’s, is reading aloud an e-mail Bourdain had sent him: “Dude, this is a crazy thing to ask, but I’m curious” Choe begins reading, and then the voice fades into Bourdain’s own: “. . . and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” I asked Neville how on earth he’d found an audio recording of Bourdain reading his own e-mail. Throughout the film, Neville and his team used stitched-together clips of Bourdain’s narration pulled from TV, radio, podcasts, and audiobooks. “But there were three quotes there I wanted his voice for that there were no recordings of,” Neville explained. So he got in touch with a software company, gave it about a dozen hours of recordings, and, he said, “I created an A.I. model of his voice.” In a world of computer simulations and deepfakes, a dead man’s voice speaking his own words of despair is hardly the most dystopian application of the technology. But the seamlessness of the effect is eerie. “If you watch the film, other than that line you mentioned, you probably don’t know what the other lines are that were spoken by the A.I., and you’re not going to know,” Neville said. “We can have a documentary-ethics panel about it later.”

Read the rest of this entry »

Comments (3)

Publication penalties

Amanda D'Ambrosio, "Mayo Physician Fired Over COVID Book", MedPage Today 6/24/2021:

After publishing a book about his experience on the front lines during the COVID-19 pandemic, a physician was fired from his position at the Mayo Clinic this month, he confirmed to MedPage Today.

Steven Weiss, MD, an internist who practiced at the clinic's Eau Claire, Wisconsin location for 32 years, stated that he was terminated because he identified himself as an employee of the Mayo Clinic in his new book, called Carnage in America: COVID-19, Racial Injustice, and the Demise of Donald Trump.

According to a June 4 termination letter shared with MedPage Today, Mayo Clinic administrators told Weiss, 62, that his actions violated the health system's publishing policy, as he did not submit his manuscript to the institution for review before it was printed.

"I'm still in shock that I was terminated for this," Weiss said in an interview with MedPage Today. "I had no idea that they would claim a right to pre-vet a book before publication."

Read the rest of this entry »

Comments (35)

Stochastic parrots

Long, but worth reading — Tom Simonite, "What Really Happened When Google Ousted Timnit Gebru", Wired 6/8/2021.

The crux of the story is this paper, which is now available on the ACM's website: Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜" In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610-623. 2021.

As a result of a (somewhat strange) review process, described at length in the Wired article, Timnit Gebru and Margaret Mitchell were fired (or declared to have resigned) from their leadership roles in Google's Ethical AI group.

Read the rest of this entry »

Comments (18)

"This massive monster of incomprehensibility"

Atul Gawande, "Why doctors hate their computers", 11/5/2018, underlines the often-noted difficulty of working with badly-designed software:

I’ve come to feel that a system that promised to increase my mastery over my work has, instead, increased my work’s mastery over me. I’m not the only one. A 2016 study found that physicians spent about two hours doing computer work for every hour spent face to face with a patient—whatever the brand of medical software. In the examination room, physicians devoted half of their patient time facing the screen to do electronic tasks. And these tasks were spilling over after hours. 

But the most interesting part of the article, at least for me, was the discussion of reading the  records rather than writing them.

Read the rest of this entry »

Comments (17)

Meta-methodology

Comments (5)