Touring the Turing Test again

« previous post | next post »

The buzz about Large Language Models has re-ignited interest in Alan Turing's famous 1950 article "Computing Machinery and Intelligence". Two interesting recent discussions: Jessica Riskin, "A Sort of Buzzing Inside My Head", NYRB 6/25/2023, and Mustafa Suleyman, "The Coming Wave: Technology, Power, and the Twenty-first Century's Greatest Dilemma", Random House 9/5/2023.

Suleyman's book won't be released until 9/5/2023, so it's interesting that several outlets have blurbed one of its ideas ten weeks early: Brad Stone, "AI Leader Proposes a New Kind of Turing Test for Chatbots", Bloomberg 6/20/2023, and Sawdah Bhaimiya, "DeepMind's co-founder suggested testing an AI chatbot's ability to turn \$100,000 into \$1 million to measure human-like intelligence", Business Insider 6/20/2023.  Based just on Business Insider's title, Suleyman's proposal puzzled me, since we don't usually think of machine-trading systems as measuring intelligence — at least not the intelligence of the system rather than its designer. But in fact Suleyman has something different in mind, more along the lines of  an extended "shark tank" competition:

In describing his proposal, Suleyman argues that there’s a misplaced focus in the tech industry on the distant possibility of achieving artificial general intelligence, or AGI: algorithms with cognitive abilities that match or exceed humans’. Instead, he said the more achievable and meaningful short-term goal is what he calls artificial capable intelligence, or ACI: programs that can set goals and achieve complex tasks with minimal human intervention.

To measure whether a machine has achieved ACI, he describes a “modern Turing test” — a new north star for researchers — in which you give an AI \$100,000 and see if it can turn the seed investment into \$1 million. To do so, the bot must research an e-commerce business opportunity, generate blueprints for a product, find a manufacturer on a site like Alibaba and then sell the item (complete with a written listing description) on Amazon or Walmart.com.

Suleyman expects AI will pass this more practical threshold sometime in the next two years. “We don’t just care about what a machine can say; we also care about what it can do,” he writes. And when that happens, he says, “The consequences for the world economy are seismic.”

It's not clear to me that success in a product development game is a good general measure of "intelligence" or even "capability" — lots of intelligent and capable people would chose not to participate, or would be bad at the game if they joined it. And folks in other fields might be tempted to define "ACI" in terms of success at the kind of thing they do (or want to do) — managing a lawsuit; designing, running, interpretation, writing up, and publishing a psychology experiment; negotiating, planning, and managing a home remodeling project; etc. etc.

Jessica Riskin's article is organized around the idea of exploring ChatGPT's responses to questions taken from Turing's 1950 article — and interestingly, ChatGPT does not do well. Read the whole thing (though I guess it's behind a paywall — maybe worth a trial subscription). More on this later…

Some relevant past posts:

"Our love was real", 9/7/2009
"The case of the missing spamularity", 12/23/2010
"Ways To Be More Interesting In Conversation", 9/21/2021
"Unfair Turing Test handicaps", 6/12/2014

 



21 Comments

  1. KeithB said,

    June 28, 2023 @ 7:59 am

    I have always thought that a good test would be if an AI can initiate a conversation. Something with a bit more context than Eliza's "Tell me about yourself."

  2. SP said,

    June 28, 2023 @ 8:27 am

    At least for me, the dollar signs in the article title and block quote have been interpreted as equation delimiters: "…ability to turn 100,0001 million…"

  3. SP said,

    June 28, 2023 @ 9:34 am

    (seems to be fixed now – thanks)

  4. Paul Topping said,

    June 28, 2023 @ 10:22 am

    Turing's Test is still meaningful as long as the human involved is an AI expert who knows how to detect LLMs like ChatGPT. Suleyman's particular test doesn't impress me but perhaps he's moving in the right direction. While true AGI is interesting to contemplate, in the near term we should be looking at machine intelligence that complements our own. We've pretty much always used computers to do things for us that we are not well-equipped to do. Near-term AI will just be more of the same but with increasing intelligence. We will have to learn how best to configure this intelligence to serve our needs. ChatGPT and its ilk can be considered an experiment with lots of things we need to deal with.

  5. Scott P. said,

    June 28, 2023 @ 10:30 am

    While true AGI is interesting to contemplate, in the near term we should be looking at machine intelligence that complements our own.

    In this sense, then, is a slide rule intelligent?

  6. Mark Young said,

    June 28, 2023 @ 11:25 am

    @Scott P.
    It's not that complementing human intelligence is intelligence; it’s that we should be trying to build intelligence that complements our own.

  7. David L said,

    June 28, 2023 @ 11:36 am

    I find it very dispiriting that the exciting new test for AI should be 'can it turn a profit?' But then any assessment of what AI can do is likely to reflect our cultural values rather than any original characteristics of the AI itself.

  8. Kenny Easwaran said,

    June 28, 2023 @ 11:45 am

    The idea of being able to turn \$100,000 into a million dollars isn't that this is a reasonable test for human-level intelligence – it's rather that this is a test for whether a system is going to, by itself, make a big difference in the world. Anyone or anything that can reliably turn \$100,000 into a million dollars is going to quickly amass a lot of wealth and power.

  9. Kenny Easwaran said,

    June 28, 2023 @ 11:46 am

    Looks like the comments are taking these dollar signs as LaTeX math mode. Can I do something good with that? $\int_0^\infty e^{-x}dx=1$

  10. Seth said,

    June 28, 2023 @ 11:55 am

    It seems like a weird test given that product success depends on so many other factors. How many failed products does it get before and after a one-hit wonder? (fashion or "gag" gifts would be fertile ground for this – almost everything fails, but hitting a hot trend can make a huge amount of money).

    I'm sure right now plenty of people not just in finance trading, but for example venture capital, are trying to use these programs to get a market edge. If one ever succeeds, it'll probably be obvious.

  11. Richard Hershberger said,

    June 28, 2023 @ 12:00 pm

    KeithB: How 'bout them Yankees?

  12. Cervantes said,

    June 28, 2023 @ 12:06 pm

    I happen to have a subscription. It turns out the bots sonnets aren't really all THAT bad. Viz:

    ) ChatGPT’s response to the possibility of substituting “a winter’s day”

    Certainly! Here’s an alternative version with “a winter’s day” in place of “a summer’s day”:

    Sonnet: A Winter’s Day’s Comparison

    Shall I compare thee to a winter’s day?
    Thou art as pure, serene, and coldly bright.
    When frosty breath hangs in the air, at play,
    And nature rests in peaceful, snowy light.

    Thy grace, a tranquil landscape dressed in white,
    Brings stillness to the world, a hush profound.
    Thy icy touch, a kiss of chill and bite,
    Creates a beauty, silent and profound.

    Though summer’s warmth, with vibrant hues and sound,
    May stir the senses, and inspire cheer,
    In winter’s calm, a quiet can be found,
    A time of peace, when all seems crystal-clear.

  13. KeithB said,

    June 28, 2023 @ 2:27 pm

    Richard Hershberger:
    This fails the context part, unless the bot would have some reason to believe I care about the Yankees.

    I have heard tell that Trump would fail the investment challenge. I will just leave the obvious implication on the floor right here.

  14. Chester Draws said,

    June 28, 2023 @ 5:01 pm

    I suspect the intelligence Suleyman is seeking will always be 2 years into the future (in the same way that functional fusion power has been 20 years away for my whole life).

    Anyone who solved it would be stupid to release it anyway.

    I imagine Trump would fail the investment challenge — unless it permitted real estate as one of the options. But then Einstein would have failed it even more so. Saying people are useless outside their areas of expertise is trite.

  15. John Swindle said,

    June 28, 2023 @ 6:33 pm

    Today's news of the on-again, off-again capture of some Amazon e-book bestseller lists by AI-generated drivel may be relevant here.

  16. bks said,

    June 29, 2023 @ 8:41 am

    Why can't it be something directly useful like cleaning the bathroom?

  17. Taylor, Philip said,

    June 29, 2023 @ 2:48 pm

    Or weeding the garden — I don't mind a few false negatives, but one false positive would have me put it straight in the bin …

  18. Taylor, Philip said,

    June 29, 2023 @ 4:02 pm

    I think that Scott P, with his "In this sense, then, is a slide rule intelligent?", gets to the very heart of the issue. Is AI in any way intelligent (qua intelligent), or is it simply fast, powerful, and capable of seemingly simulating intelligence without actually possessing it ?

  19. Seth said,

    June 29, 2023 @ 4:53 pm

    @ bks, Taylor, Philip – tasks involving unconstrained physical motion have been extremely resistant to AI. I think a weed/flower classifier might be acceptable in terms of low errors, but actually picking the weeds is another matter. That being said, robot vacuum cleaners (hoovers across the pond?) have been one of the great success stories taken for granted.

  20. Taylor, Philip said,

    June 30, 2023 @ 3:14 am

    "Across the pond", Seth, we say robotic vacuum cleaners — Hoovers are, as far as I am aware, never robotic. I have two of the former, and a robotic lawn mower as well. None demonstrate any intelligence, but are nonetheless pretty good at their tasks.

  21. Rod Johnson said,

    June 30, 2023 @ 8:19 am

    @Cervantes: but, unless I'm misremembering, that's not a sonnet, so also not that good.

RSS feed for comments on this post