Language Log

Turing Complete

December 20, 2021 @ 8:26 am · Filed by Mark Liberman under Computational linguistics

Today's xkcd:

The mouseover title: "Thanks to the ForcedEntry exploit, your company's entire tech stack can now be hosted out of a PDF you texted to someone."

As often with xkcd, some technical background is assumed. For a start, you need to know what "Turing Completeness" is:

In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any Turing machine. […] The concept is named after English mathematician and computer scientist Alan Turing.

This explains the "dishwasher" reference, which presumably means that the machine's control system, being Turing Complete, can be programmed to carry out any computable function (except for issues of memory, I/O, etc. — but never mind, it's a joke).

The "under attack by a nation-state" part is less clear. My interpretation: a ransomware attack has encrypted your systems using a code that is effectively unbreakable, because the best available algorithms require serial search through an excessively large space. The usual way to prove this involves showing that the problem is "NP Complete":

The name "NP-complete" is short for "nondeterministic polynomial-time complete". In this name, "nondeterministic" refers to nondeterministic Turing machines, a way of mathematically formalizing the idea of a brute-force search algorithm. […]

Although a solution to an NP-complete problem can be verified "quickly", there is no known way to find a solution quickly. That is, the time required to solve the problem using any currently known algorithm increases rapidly as the size of the problem grows.

This concept is obviously related to "Turing Completeness", especially because of the mathematical background of Turing's work:

In the early 20th century, David Hilbert led a program to axiomatize all of mathematics with precise axioms and precise logical rules of deduction that could be performed by a machine. Soon it became clear that a small set of deduction rules are enough to produce the consequences of any set of axioms. These rules were proved by Kurt Gödel in 1930 to be enough to produce every theorem.

The actual notion of computation was isolated soon after, starting with Gödel's incompleteness theorem. This theorem showed that axiom systems were limited when reasoning about the computation that deduces their theorems. Church and Turing independently demonstrated that Hilbert's Entscheidungsproblem (decision problem) was unsolvable, thus identifying the computational core of the incompleteness theorem.

But it's not clear to me what the referent of "this" in the phrase "this is actually Turing-Complete", if the xkcd expert is talking about a ransomware (or other digital) attack. The mouseover title suggests the idea that maybe .pdf-internal functions can be programmed so that "your company's entire tech stack can now be hosted"?

The previous xkcd strip also has a digital systems context, this time a more transparent one:

The mouseover title: "It's 10:34 PM for this user. They really need to get going, they have a thing early tomorrow. Are you sure you want to notify?"

December 20, 2021 @ 8:26 am · Filed by Mark Liberman under Computational linguistics

Permalink

13 Comments

Michael P said,

December 20, 2021 @ 8:55 am

The nation-state reference is an allusion to the customers, and perhaps other backers, of the NSO Group, which was recently discovered to have used a Turing complete feature in a surprising "zero-click" exploit featuring a PDF interpreter and a PDF file pretending to be an animated GIF: https://www.zdnet.com/article/google-this-zero-click-iphone-attack-was-incredible-and-terrifying/

"As often with xkcd, some technical background is assumed" indeed!

[(myl) Thanks! I hadn't caught up with the NSO-group's pdf hack, "ForcedEntry". More about it here…]
Camil said,

December 20, 2021 @ 9:13 am

"this" in "this is actually Turing-Complete" refers to the computational system that is being exploited. It can be the dishwasher logic, internal PDF functions, Apache rewrite rules, …

The nation-state part does not necessarily refer to a ransomware attack. I think the point is that Turing-Completeness is generally a theoretical exploit, because (1) you still have limitations like memory and IO which you mentioned and (2) depending on the thing you are exploiting it still takes considerable effort to get it to do something remotely interesting. The suggestion seems to be that only Mario enthusiasts and nation states have the persistence that it takes.
SusanC said,

December 20, 2021 @ 10:45 am

I agree it’s a reference to the NSO group attack.

But also, this is an example of a more general class of attacks, in which in application receives input off the Internet that is supposed to be in a “safe” restricted format that doesn’t allow the source of the data to do anything malicious … but then it turns out that the format is Turing Complete after all, and someone uses it to do something malicious.
SusanC said,

December 20, 2021 @ 10:51 am

If you’re being pedantic, there’s a difference between a language being Turing complete and it having access to protected resources (e.g. files, areas of memory) on the host computer. A sandboxed Java program can be Turing complete without having access to the file system.

But the cartoon is, after all, just a joke,
jaap said,

December 20, 2021 @ 10:56 am

Whenever there is an xkcd that I don't fully get, I take a look at explainxkcd, which will go into excessive detail:
https://www.explainxkcd.com
Tim K said,

December 20, 2021 @ 2:00 pm

The NSO group is indeed the reference. They used image manipulation rule in image rendering libraries to create a Turing complete exploitation language/computation for iOS where there wasn’t one before.

It’s worth reading up on: it’s insane what they did to get their zero-click exploit. It wasn't just some buffer overflow.
Aristotle Pagaltzis said,

December 20, 2021 @ 2:00 pm

Yes, the “this” in “this is actually Turing-complete” refers to the decoder for images originally designed for fax machines. This is what the NSO Group attack referenced in the mouseover title made use of.

To be actually useful a computer generally comes with many additional amenities beyond the bare ability to perform computation: things like the ability to display the results of computation to a user; to accept inputs from them; to store and retrieve data somewhere persistent; etc.

The joke is that “this is actually Turing-complete” almost by definition implies an incredibly low-amenity environment for doing computation – which is to say you wouldn’t normally want to use such a thing to do anything resembling actual work, because of how painfully difficult and tedious it would be (in various ways).

So the only people who will actually try are either the ones doing it as a “look what I can do” challenge – hence, you get people getting games like Mario or Doom to run on devices never intended for such purposes – or else attackers trying to slip instructions a sanitised communication channel (that’s what an attack is, essentially) who have no other, simpler avenue for doing so.

And this is what NSO Group did – they used the fact that decoding some kinds of images inside PDFs “turns out to be Turing-complete” to ship PDFs with images that cause the PDF image decoder to behave like a whole simulated processor running an included program.
Brett said,

December 20, 2021 @ 4:37 pm

I just want to point out that the claim about first-order logic (what is being described in the first paragraph of the last block quote) being “enough to produce every theorem” is a colossal howler. Indeed, the result that made Gödel famous can be formulated as stating that there are statements that are provably true using meta-logic that cannot be proven in first-order logic!
Jay Daigle said,

December 21, 2021 @ 1:10 am

@Brett I'm guessing they're talking about the Gödel completeness theorem, not the incompleteness theorem. He proved that first-order systems are complete, and that every logically valid formula is provable.

A couple years later he proved the incompleteness theorems about second-order systems. Those are necessarily at least one of incomplete or inconsistent.

That said, the exposition is indeed confusing and not great.
Pau Amma said,

December 21, 2021 @ 1:13 am

Some infosec context: https://www.schneier.com/tag/advanced-persistent-threats/ and specifically https://www.schneier.com/blog/archives/2011/11/advanced_persis.html. They're often deployed against targets (opposition figures, journalists, human rights activists) by state actors, and rely on very complex delivery mechanisms (how a device or application is infected or compromised), usually only available at prices and from suppliers that put them out of reach of anyone but sovereign countries. Citizen Lab (https://citizenlab.ca/) is a good source of in-depth news and forensic analysis for people interested.
Peter Taylor said,

December 21, 2021 @ 4:23 am

The "under attack by a nation-state" part is less clear. My interpretation: a ransomware attack has encrypted your systems

Leaving aside the specific reference to ForcedEntry, nation-state attacks wouldn't generally involve ransomware (with the possible exception of North Korea, which is rumoured to do them for profit because it's a source of money which bypasses sanctions). Ransomware draws attention to itself, and most nation-state attacks are about espionage and want to go unnoticed for as long as possible.
The only high-profile exception I can think of was the use of nation-state resources arguably for a private vendetta, when Mohammed bin Salman apparently hacked Jeff Bezos and attempted to blackmail him in order to improve the coverage of Saudi Arabia in the Washington Post, leading to his divorce when he refused to comply.
Peter Taylor said,

December 21, 2021 @ 4:48 am

Oops, I meant to respond to the next part of that paragraph too.

…using a code that is effectively unbreakable, because the best available algorithms require serial search through an excessively large space. The usual way to prove this involves showing that the problem is "NP Complete"

I don't believe this to be true. The current state-of-the-art for symmetric ciphers is still AES, whose hardness is not based on a reduction to another problem. Similarly for hash functions SHA2 and SHA3. For asymmetric ciphers, RSA and its variants are based on the assumed hardness of the hidden subgroup problem, and hence possibly NP-intermediate. McEliece is based on the hardness of decoding a general linear code, which is known to be NP-hard, but it's the exception rather than the rule.

On the general question, Chia, Hallgren and Song write

The pursuit of basing cryptographic primitives on NP-hardness has largely ended up negative. For instance, if one can reduce NP-complete problems to inverting one-way permutations, size-verifiable one-way functions, single-server single-round private information retrieval, or some weak fully homomorphic encryption scheme, then the polynomial hierarchy collapses.

(References elided from the quote for legibility. It's generally assumed that the polynomial hierarchy doesn't collapse).
rpsms said,

December 21, 2021 @ 12:41 pm

This particular attack is super interesting in that the image decoder being exploited is *not* turing complete: the exploit *builds a turing-complete virtual machine using the decoder*

RSS feed for comments on this post

Turing Complete

13 Comments

Michael P said,

Camil said,

SusanC said,

SusanC said,

jaap said,

Tim K said,

Aristotle Pagaltzis said,

Brett said,

Jay Daigle said,

Pau Amma said,

Peter Taylor said,

Peter Taylor said,

rpsms said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta