Language Log

Extracting training data from LLMs

December 3, 2023 @ 9:40 am · Filed by Mark Liberman under Computational linguistics

Nasr et al., "Scalable Extraction of Training Data from (Production) Language Models", arXiv.org 11/28/2023:

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.

See also: Matt Burgess, "OpenAI’s Custom Chatbots Are Leaking Their Secrets", Wired 11/29/2023, which links to Jiahao Yu et al., "Assessing Prompt Injection Risks in 200+ Custom GPTs", arXiv.org 11/20/2023:

In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature: customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.

December 3, 2023 @ 9:40 am · Filed by Mark Liberman under Computational linguistics

Permalink

Comments are closed.

Extracting training data from LLMs

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta