DARPA/Dartmouth one/won …

« previous post | next post »

Despite the evidence of my most recent relevant post, the best current speech-to-text systems still make mistakes that a literate and informed human wouldn't.

In this recent YouTube video on the history of robotics research, the automatic closed-captioning system renders "DARPA" as "Dartmouth":

The audio is clear enough:

My first thought was that  the system just doesn't have DARPA in its lexicon, but later in the transcript we get:

The rev.ai system I praised in Friday's post transcribes "DARPA" correctly from the start:

The Defense Advanced Research Project Agency, DARPA for Short was created in 1958, almost immediately after the launch of Sputnik won by the U S S R.

But the result is not yet quite perfect. Its capitalization of "Short" is a bit weird. (Though otherwise the deployment of capital letters is as it should be, unlike the YouTube caption's even weirder choices.)

There's a missing comma between "Short" and "was" — punctuation is definitely a new frontier for these systems.

And of course it should be "Sputnik 1", not "Sputnik won". Since "won" and "one" are perfect homophones [Update — at least for the voice-over speaker], the correct choice depends on knowing something about history (that Sputniks 1's name announced its status as the first artificial satellite), and also something about the meaning of the word launch (that launches are not normally won or lost). Of course, both of those bits of "knowledge" could be imitated by a Large (enough) Language Model…

Anyhow, speech-to-text is pretty good these days, but there's still some headroom.

 



24 Comments

  1. Andrew McCarthy said,

    August 14, 2023 @ 7:00 am

    I think it should be considered "winning the launch" if the rocket in question doesn't experience a "rapid unscheduled disassembly", as Space X put it a few months back. ;)

  2. Victor Mair said,

    August 14, 2023 @ 7:41 am

    I will be at the real Dartmouth in a couple of days. I'll see if I can find where DARPA is hanging out there these days.

    Memories of John G. Kemeny, co-developer of BASIC programming language in 1964 and President of Dartmouth College from 1970-1981.

    https://en.wikipedia.org/wiki/John_G._Kemeny

    https://en.wikipedia.org/wiki/BASIC

  3. postmortes said,

    August 14, 2023 @ 12:59 pm

    '…"won" and "one" are perfect homophones…' In which dialect of English is that? As they're quite distinct for me.

    Cryptic crossword clues sometimes have homophone-clues, which almost always leads to discussion (and complaint) from the solvers that the setter's idea of what words sound alike does not match their own, so I'm not actually convinced there are any perfect homophones in the 'ideal' English language.

  4. Taylor, Philip said,

    August 14, 2023 @ 1:49 pm

    « '…"won" and "one" are perfect homophones…' In which dialect of English is that? » — In British English at least (tho' not necessarily the whole of the British Isles). The LPD has "won" : /wʌn/ and "one" : /wʌn/, although the Korean currency "won" is different — /wɒn/.

  5. Robot Therapist said,

    August 14, 2023 @ 1:53 pm

    "In which dialect of English is that? As they're quite distinct for me."

    They are homophones where I am (in an upper middle class part of London).

  6. Ginny Bear said,

    August 14, 2023 @ 5:05 pm

    They are perfect homophones for this native speaker, raised in Southern California and living in the Pacific Northwest since 1990.

  7. djw said,

    August 14, 2023 @ 11:29 pm

    They're perfect homonyms in central Texas, too.

    Of course, pin, pen, and sometimes pan are homonyms there, too.

  8. djw said,

    August 14, 2023 @ 11:30 pm

    Um…late at night and I have a headache….homophones, not homonyms….

  9. Chester Draws said,

    August 15, 2023 @ 1:27 am

    Won and one are perfect homophones in NZ English too, all varieties.

    Pan, pen, pin and pun are not homophones in NZ English, but are so similar that they sound like they are to much of the rest of the world.

    "the best current speech-to-text systems still make mistakes that a literate and informed human wouldn't"

    Most Kiwis, even quite literate ones, would not know what DARPA is, and so would likely make a very similar mistake, although not Dartmouth, since most would not know that either.

    I've seen a discussion about smuggled golf clubs where the person transcribing wrote "What about the sandwich?", to my utter bafflement. It turns out, it was actually "What about the sand wedge?".

  10. David L said,

    August 15, 2023 @ 10:19 am

    If I can recall my father's voice correctly, he pronounced 'one' like 'gone' and 'won' with a vowel similar to 'put.' He was from Derbyshire.

  11. David Deterding said,

    August 16, 2023 @ 3:37 am

    For whether 'won' and 'one' are homophones in British English, it is useful to refer to some research that considers how 'one' is pronounced. Traditionally, 'one' rhymed with 'fun', so 'won' and 'one' were indeed homophones; but increasingly, 'one' rhymes with 'con' and 'don', and not with 'fun' and 'won' (it has the LOT vowel and not the STRUT vowel). John Wells (Longman Pronunciation Dictionary, 2008) shows that this trend for 'one' to have the same vowel as 'con' (i.e. the LOT vowel) is increasing, occurring with 45% of younger speakers but only 18% of older speakers, but one might assume that the numbers have increased since then.

  12. Philip Anderson said,

    August 16, 2023 @ 3:37 am

    @postmortes
    Could you tell us where they are NOT homophones, please? And how they differ, or what they rhyme with (sun, son, con). AFAIK they are homophones in most dialects, including the recognised standards.

  13. Taylor, Philip said,

    August 16, 2023 @ 4:44 am

    David, can you quote the exact text to which you refer, please ? I have the LPD (1993 edition) open at page 495, main entry for "one", and can find no mention of the text you cite, so this may have been a late-breaking emendation by John Wells.

  14. Jarek Weckwerth said,

    August 16, 2023 @ 8:51 am

    @Philip Taylor: Wells's 2008 edition (15 years ago!) has a separate poll panel on this, and there's a comparatively thorough discussion in Geoff Lindsey's English after RP.

  15. Taylor, Philip said,

    August 16, 2023 @ 9:15 am

    Thank you Jarek — I have ordered a copy of the 2008 edition for £4·99, post free. As I find the whole idea of "English after RP" rather depressing, I don't think I will also be ordering a copy of Geoff Lindsey's book …

  16. postmortes said,

    August 16, 2023 @ 10:10 am

    @PhilipAnderson I grew up in Liverpool and have lived around the world, but am currently in London. I note a commentor above believes that Londoners treat 'won' and 'one' as perfect homophones — that's not my experience at all, from East to West London. For me, 'one' rhymes with 'con' and 'won' with 'fun'.
    The observation that this might be a generational one is interesting though as much of working life has been and is spent around people aged 20-30.

  17. J.W. Brewer said,

    August 16, 2023 @ 3:22 pm

    I note that Lindsey's book has a subtitle: "English After RP: Standard British Pronunciation Today." This is helpful in conveying the book's narrow (I won't say "provincial") focus. Speaking of AmEng pronunciation "after RP" would be either comical or incoherent, since RP was never spoken (except by the occasional pretentious weirdo) on this side of the Atlantic and as I understand it largely evolved on the other side of the Atlantic after AmEng and BrEng had already diverged although perhaps before AmEng had definitively outpaced BrEng in total number of speakers.

    Are there other words where for at least some younger British speakers a historic STRUT vowel has evolved into a current LOT vowel, or is "one" an, as it were, one-off?

  18. Jarek Weckwerth said,

    August 16, 2023 @ 5:02 pm

    @postmortes: On the prevalence of won with the vowel of LOT, I would recommend trying Youglish, as usual. (I'm not associated with them in any way; I don't even know who they are and it's not easy to find out; it's just a good resource for this kind of thing; and I'm not giving a link since I've had posts sucked into moderation as a result of including links.) From a quick scan of the first 20-odd examples, I would say the vowel of STRUT totally dominates. To me, LOT in one feels decidedly dialectal.

  19. Philip Anderson said,

    August 16, 2023 @ 5:21 pm

    @postmortes
    I also live in (West) London, after many years in South Wales, but am from an older generation. To me, they are homophones, and I haven’t noticed an ‘-on’ pronunciation for ‘one’, but maybe I need to listen carefully to the younger people in the office. I assume you still have the initial ’w’, so are ’one’ and ‘wan’ homophones for you?

  20. Bloix said,

    August 16, 2023 @ 10:35 pm

    When my dad began working there in 1968, it was ARPA. The Nixon folks turned it into DARPA in 1972 because, as my dad saw it, they were concerned that the eggheads were too interested in pure science and needed to reminded that they were supposed to be funding research with clear military applications.

  21. postmortes said,

    August 17, 2023 @ 6:49 am

    @PhilipAnderson It seems like, based on the anecdata here, that my pronunciation is less than dialectal and heading towards unique! Unfortunately I don't have much of a dialect, or a accent either; to the extent that when I'm speaking a foreign language I'm often mistaken for a native speaker.
    As for 'one' and 'wan' — nope, they're not homophones either. 'Wan" is pronounced like 'pan' (and if we hadn't be having this discussion I might well have added 'to distinguish it from one and won' :) )

  22. Taylor, Philip said,

    August 17, 2023 @ 12:53 pm

    "Unfortunately I don't have much of a dialect, or a accent either; to the extent that when I'm speaking a foreign language I'm often mistaken for a native speaker." — would it be reasonable to ask if you are a native speaker of English ? I ask because someone pronouncing "wan" (/wɒn/) to rhyme with "pan" (/pæn/) would sound distinctly non-native to my (native-speaker) ears.

  23. David Deterding said,

    August 17, 2023 @ 6:47 pm

    J. W. Brewer asked if there are words other than 'one' where LOT now occurs in place of the traditional STRUT vowel.

    Upwards & Davidson (The History of English Spelling, 2011, p. 193), in a section entitled 'Spelling Pronunciation', gives the following list of words in which this shift has occurred: colander, combat, comparable, comrade, conduit, constable.

    I am now in Australia, and I note 'tonne' being pronounced with LOT, not STRUT. I don't know if this shift is also occurring in the UK.

    You might wonder why 'one' was written with 'o' rather than 'u'. John Algeo (The Origins and Development of the English Language, 6th ed, 2005, p. 118) suggests it is because, in cursive script in the Middle English period, 'u' could not easily be distinguished from 'm', 'n' and 'v'. That explains why many words with one of these consonants following the vowel have 'o' not 'u': come, some, none, done, son, won, love, dove, etc.

  24. Taylor, Philip said,

    August 18, 2023 @ 3:32 am

    "colander, combat, comparable, comrade, conduit, constable" — of these, there is just one in which I invariably use the STRUT vowel : "conduit", which I was taught at school (at the age of about ten) should be pronounced /ˈkʌn·dɪt/, and which is therefore how I continue to pronounce it to this day.

    With "colander", this I learned from my mother as /ˈkʌl·ən·də/ , but when I later encountered the word in its written form, I adopted /kɒl·ˈæn·də/, which I may well still use today (it is not a word of which I have frequent need, unlike "conduit"), rather as I switched from saying /ˈɪŋ·ɡlənd / to saying /ˈeŋ·ɡlənd / after first seeing the word in print.

    "Constable" I first heard as /ˈkʌn·stə·bəl/ but now invariably pronounce as /ˈkɒn·stə·bəl/ — as teenagers, we routinely joked about saying "Good evening, CUNSTable", but never did, of course.

RSS feed for comments on this post