AI hype #∞

« previous post | next post »

In social and even mass media, you may have seen coverage of a recent paper by Joshua Harrison et al., "A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards". Some samples of the clickbait:

"A.I. can identify keystrokes by just the sound of your typing and steal information with 95% accuracy, new research shows", Fortune
"Do not type passwords in offices, new AI tool can steal your password by listening to your keyboard clicks", India Today
"AI Can Now Crack Your Password by ‘Listening’ to Your Keyboard Sounds", Beebom
"AI tools can steal passwords by listening to keystrokes during Zoom calls, study says", Khaleej Times
"How your keyboard sounds can expose your data to AI hackers", Interesting Engineering

But if you read the paper, you'll find very little to be concerned about — or at least nothing much new to add to your cybersecurity worries.

The first thing you'll find in the paper is a survey of the long history of "Acoustic Side Channel Attacks" (ASCAs). As the authors note, "Acoustic Side Channel Effects [ …] are not a new concept to the field of cybersecurity. Encryption devices have been subject to emanation-based attacks since the 1950s, with British spies utilising the acoustic emanations of Hagelin encryption devices (of very similar design to Enigma) within the Egyptian embassy. Additionally, the earliest paper on emanation-based SCAs found by this review was written for the United States’ National Security Agency (NSA) in 1972 ."

And a brief web search turns up a similar story from a couple of decades ago — "Acoustic Keyboard Evesdropping", NYT 12/12/2004:

When it comes to computer security, do you have faith in firewalls? Think passwords will protect you? Not so fast: it is now possible to eavesdrop on a typist's keystrokes and, by exploiting minute variations in the sounds made by different keys, distinguish and decipher what is being typed.

The 2004 work was documented in Dmitri Asonov and Rakesh Agrawal, "Keyboard acoustic emanations", IEEE Symposium on Security and Privacy, 2004:

We investigate acoustic emanations of a PC keyboard, the clicks, to eavesdrop upon what is being typed. This attack is based on the hypothesis that the sound of clicks can differ slightly from key to key, although the clicks of different keys sound very similar to the human ear. Our experiments show that a neural network can be trained to differentiate the keys to successfully carry out this attack.

So what's new in the Harrison et al. 2023 paper? They say they "present a practical fully–automated ASCA which deploys cutting edge deep learning models".

That's sort of true, except maybe for the "practical" part. Some problems:

  1. They used one single laptop in testing and training, resting it on a cloth mat to eliminate table vibrations.
  2. They recorded keypress sounds using one single smartphone in a fixed location (or one particular Zoom session and connection).
  3. They recorded repetitions of a single isolated keystroke at a time, on a single occasion, using the results for both testing and training.
  4. They tested sounds from 36 keys (a-z, 0-9), with no shift-keys or etc., "with each being pressed 25 times in a row".
    So 36*25=900 sounds altogether, perhaps times 2 for local and Zoom versions.
  5. They trained and tested the smartphone recordings and the zoom recordings separately.
  6. "The keystrokes isolated for this data were of fixed length 14400 (0.33s)" — but if you type 35 wpm, that's 35*7=245 keystrokes per minute, or 60/245=0.245 seconds/keystroke on average, so that their analysis window would often contain overlapping keystroke sounds in real-world typing.

I would bet a large sum of money that if the authors tested their same model on recordings of real-world typing from many different computer keyboards, recorded on different devices at different distances in  different acoustic environments, the accuracy would fall nearly to zero.

Could a more sophisticated version of their method be made to work? "More sophisticated" would have to mean: trained on a MUCH larger body of data, including multiple recordings of real-world typing from multiple keyboards in multiple acoustic environments; and also using more sophisticated techniques for dealing with sounds from overlapping bursts of keypresses.

I'm guessing that there are companies and government agencies around the world who know the answer to that question. And I'm guessing that the answer is "sort of", though this is not something I'd bet money on, one way or the other.

The 2004 Asonov and Agrawal paper was in some ways stronger than the 2023 paper — at least they tested multiple keyboards, and multiple microphones recording at various distances, though they didn't engage the problem of overlapping keypress sounds. Oddly, the 2023 paper includes the 2004 paper in its bibliography, but doesn't mention it otherwise.

Update — We should also remember the (I think more practical) problem of Van Eck phreaking:

Van Eck phreaking, also known as Van Eck radiation, is a form of eavesdropping in which special equipment is used to pick up side-band electromagnetic emissions from electronic devices that correlate to hidden signals or data to recreate these signals or data to spy on the electronic device. Side-band electromagnetic radiation emissions are present in (and with the proper equipment, can be captured from) keyboards, computer displays, printers, and other electronic devices.

In 1985, Wim van Eck published the first unclassified technical analysis of the security risks of emanations from computer monitors.[1][2] This paper caused some consternation in the security community, which had previously believed that such monitoring was a highly sophisticated attack available only to governments; van Eck successfully eavesdropped on a real system, at a range of hundreds of metres, using just $15 worth of equipment plus a television set.



  1. Tam said,

    August 18, 2023 @ 3:33 pm

    Thank you very much for calling out fear-mongering. Much appreciated!!

  2. maidhc said,

    August 18, 2023 @ 7:11 pm

    I remember reading about this back when the James Bond books first became popular. With typewriters, though.

  3. Chester Draws said,

    August 18, 2023 @ 8:55 pm

    Not to mention, they have to know that you a typing a password at the time, in order to analyse it.

    The only time anyone would know that I am typing a password is when I start my computer. But since that isn't when anything else is connected, good luck with that. In any case, my start password isn't used on anything else.

RSS feed for comments on this post