Why "deep learning" (sort of) works

A recent SMBC implicitly calls Aristotelian taxonomies into question:

Mouseover title: "Like look at these leaves. 14 oakitudes, minimum."

The Aftercomic:

This reminds me of a formative experience.

When I was a small child, my Russian grandfather took me for mushroom-hunting walks in the woods. So when I was in my 30s, long after his death, I bought some field guides to mushroom identification, and went on some of the New Jersey Mycological Association's guided forays into local state parks. The complex dichotomous keys in the guidebooks were frustratingly difficult, especially since a wrong choice could be fatal. And I also found it hard to find the mushrooms in the first place, at least compared to some of the more experienced participants.

An elderly Russian woman took pity on me, and invited me to accompany her into the woods. After a bit, she suggested that we sit down on a fallen log. I thought perhaps she was tired, and started to make conversation, but she said "sshh, you'll scare the mushrooms". After a few minutes of peaceful silence, she pointed to a spot 5 or 10 yards away, where even I could then see a mushroom cap peeking out from under the carpet of dead leaves. "Look," she said, "boletus edulis".

I fetched it, brought it back to her, and asked "So how do you know what it is?" She pointed out the smooth brown cap, the tiny yellowish-white pores making up the cap's undersurface, the plump stem, the lack of a pendant ring. "And the spore print will be brown" , she explained.

"But wait a minute", I said. "You identified it when all you could see was the cap. And to get the spore print, we'd need to leave the cap overnight on a piece of paper under a bowl, right? So really, how did you know what it was?"

"Well", she answered, "it has a boletoid stature".

At that time, I still more or less believed what I'd learned at M.I.T. in the 1970s era of classical AI, which saw pattern recognition as applied logic. The basic idea was to recognize all the relevant traits, and then apply an Aristotelian taxonomy to make the classification. But I'd found, along with everyone else, that this didn't really work, among other things because it's at least as hard to recognize the relevant traits as to recognize the final category.

This was about the same time as the first wave of connectionist research arrived, under the name of "Parallel Distributed Processing", and also the flowering of "machine learning" more generally. My mushroom-identification experience made me more sympathetic to those ideas. Of course, connectionism has turned out not to solve all the problems either. But the oaky-like oaksome oakness issue remains a challenge, and we can still learn from Jorge Luis Borges' 1952 essay "El Idioma Analítico de John Wilkins".



  1. Philip Taylor said,

    October 10, 2020 @ 1:19 pm

    Bird-watchers speak of the "jizz" of a bird, and it is this, rather than any more specific feature, that enables an experienced bird watcher to identify most of the birds with which he or she is familiar at a single glance.

    But if I may add a mycological anectote to Mark's — some 20+ years ago I attended a conference in Sobieszewo, Poland, an island on which Boletus edulis can easily be found at the right time of year (and in the right location). Soon delegates from all over the world were trekking through the local woods and forests in search of mycological delicacies, but the organisers, knowing that for many of these delegates this would possibly be a first-ever fungal foray, made an announcement at the start of the next day's proceedings —

    "Ladies and gentlemen, we are delighted to learn that so many of you are taking advantage of your free time from this conference to search for mushrooms, but we are concerned that not all of you may have the expertise necessary to identify the good from the bad. Please, before you eat any mushroom that you have found, ask a Pole, or a Russian, or a Ukrainian, to identify it for you".

    Polite applause.

    Then a voice from the back — "Mr Chairman, may I make a suggestion ? Delegates seeking to identify the fungi that they have found should ask an old Pole, or an old Russian, or a old Ukrainian …".

    A massive round of applause (and not a little laughter) from the auditorium acknowledged the undoubted wisdom of this latter suggestion.

  2. Kristian said,

    October 10, 2020 @ 3:30 pm

    "It has a boletoid stature"

    This reminds me of the Finnish expression "istua/seisoa kuin tatti" (lit. sit/stand like a boletus, that is sit/stand fixed in place).

    The leaf in the comic doesn't actually look much like an oak leaf.

  3. Bob Ladd said,

    October 10, 2020 @ 4:47 pm

    Umberto Eco also wrote an essay on Wilkins (in his book, published in English as "The Search for the Perfect Language") and made some of the same points as Borges. But Eco was more concerned with the fact that in the artificial language, each separate sound in any given word denoted a category in Wilkins's classification of the universe, which meant that there was no redundancy. Change one sound and you change the meaning to some other possible combination of taxonomic dimensions.

  4. Andrew said,

    October 10, 2020 @ 9:46 pm

    "You can tell that it's an aspen tree because of the way it is"

  5. Y said,

    October 10, 2020 @ 10:13 pm

    The hardest to identify are LBMs, "little brown mushrooms", even at close inspection. I once heard a famous mycologist trying very hard to sound modest telling the story of identifying one of them, a rare and unexpcted one, growing by the side of the road—from a car at 50 MPH.

  6. David Marjanović said,

    October 11, 2020 @ 4:53 am

    Bird-watchers speak of the "jizz" of a bird

    …I'm pretty sure American birdwatchers don't, because that word means something else there.

  7. Cervantes said,

    October 11, 2020 @ 7:02 am

    Even more compelling, I think, is that humans can instantly recognize an animal as a dog, although selective breeding has made them immensely variable in appearance, much more so than oak leaves. (There are actually two broad categories of oak leaves, of red and white oaks, which look quite different from each other, BTW. But I digress.) We have no trouble at all recognizing that a bulldog, a chihuahua, a golden retriever, and a St. Bernard are all the same kind of animal. That is mysterious.

  8. Mark P said,

    October 11, 2020 @ 7:54 am

    A friend who lives in New Mexico told me about walking around a large, rocky plain west of Albuquerque with an experienced pot hunter. The pot hunter repeatedly pointed out shard and shard that my friend had trouble seeing even after it was pointed out.

    Kristian, the leaf in the cartoon looks a lot like a chestnut oak. We
    Have a lot of them around here.

  9. Robert Coren said,

    October 11, 2020 @ 9:54 am

    As @Cervantes says, there are two broad categories of oak, and the leaf in the cartoon does in fact look rather like a white oak type. I think that most non-botanists think of the the leaves of red oaks, with their deeper and more pointed lobes, as more representative of "oakiness", so I can see why @Kristian would say that it "doesn't actually look much like an oak leaf".

  10. Michael Watts said,

    October 11, 2020 @ 10:00 am

    Here is an oak leaf of the type being referenced in the comic. As a person unhindered by direct experience with oaks of any variety, even I could recognize the leaf in the comic as looking like an oak leaf. (If you do an image search for "oak leaf", you'll notice a lot of results are depictions — paintings, crochet, and such — rather than photographs. They all depict this type of leaf.)

    Here is another oak leaf, which I agree looks quite different from the stereotypical oak leaf.

  11. Michael Watts said,

    October 11, 2020 @ 10:03 am

    I think that most non-botanists think of the the leaves of red oaks, with their deeper and more pointed lobes, as more representative of "oakiness"

    The evidence seems to disagree with this claim fairly sharply. (See: the comic above; the image search results.) Could this be a regional difference?

  12. Philip Taylor said,

    October 11, 2020 @ 10:19 am

    Nationality and/or place of residence may also affect what one thinks of as "an oak tree/leaf" — being British, I immediately think of (and picture) the Pedunculate oak (Quercus robur), with the Sessile oak (Quercus petræa_ as my second choice.

  13. Michael Watts said,

    October 11, 2020 @ 10:41 am

    (I left a comment with two links to images of oak leaves, but it appears to have been swallowed.)

  14. cameron said,

    October 11, 2020 @ 10:58 am

    Several years ago I bought an old Alamo guitar amplifier, made in Texas in the mid 60s. I was trying to determine who had made the speaker. There are numeric codes on old speakers that you can use to determine the manufacturer and date of manufacture for old speakers. On this speaker there was both a code and a logo. The logo was a stylized lightning bolt through a leaf. The numeric code was somewhat scratched and hard to read, but I thought that the first three digits might be 918, which would indicate the speaker was made by Oaktron. Only at that point did the logo snap into focus and I realized that of course it was an oak leaf and lightning bolt. So yeah, I didn't recognize the leaf in the logo until I looked up the numeric code in a list.

  15. John Shutt said,

    October 11, 2020 @ 11:22 am

    Having grown up in poison ivy country, I remember being asked by someone who'd grown up elsewhere, how to recognize it, and the best I could come up with was "I know it when I see it" (followed shortly by warning them to avoid a patch of it as we stepped across a verge). I'd likely know it during the winter, when it has no leaves. Our field guide to trees and shrubs (Petrides, 1958) says "It grows as an erect shrub, trailing vine, or climber. Leaves […] may be stiff and leathery or merely thin, somewhat hairy beneath or not, shiny or dull, coarse-toothed and wave-edged, or neither."

  16. Philip Taylor said,

    October 11, 2020 @ 11:33 am

    John, may I ask for clarification of "I'd likely know it during the winter, when it has no leaves" ? Does this mean (a) "I'd probably recognise it in winter, but would have difficulty in summer", or (b) "I'd recognise it even in winter, despite the lack of leaves" ?

  17. John Shutt said,

    October 11, 2020 @ 11:43 am

    The classic rule for recognizing poison ivy is, it has three leaves (technically, those are leaflets rather than leaves, but whatever). "Leaves of three, leave it be." I mean I'd likely recognize it even in winter.

  18. Kristian said,

    October 11, 2020 @ 1:20 pm

    I think of an archetypical ("oaky") oak leaf as having deeper lobulations (the Quercus robur type of leaf). I also think of this as being the typical oak leaf pattern in art or decoration.

    This is the kind of leaf I get pictures of if I google "oak leaf" or "oak leaf pattern" or "oak leaf wallpaper", so maybe there are regional differences in search results as well. The chestnut oak is an American tree.

  19. Michael Watts said,

    October 11, 2020 @ 3:18 pm

    I agree that quercus robur is a more typical oak leaf, but you'll note that the example I gave of "a leaf of the type referenced in the comic" is of that type. I am not familiar with the chestnut oak leaf type — and didn't interpret the comic that way, though I agree the chestnut oak leaf is an even closer match.

    The people look nothing like actual people, either. It's a comic.

  20. Mark P said,

    October 11, 2020 @ 3:52 pm

    I was not familiar with chestnut oak until I moved to a mountaintop in a NW Georgia. “Quercas Montana” gives a hint why — it’s a mountaintop oak. But it has a large range, from Maine to Mississippi. It’s definitely quite common, but you do have to look in the right places.

  21. Philip Taylor said,

    October 11, 2020 @ 4:32 pm

    Thank you for the clarification, John. I remember "Leaves of three, leave them be" from the time I spent in Ontario (1985–1990, for a few weeks each year) but had forgotten (or perhaps never knew) that it was poison ivy that I was avoiding …

  22. Andreas Johansson said,

    October 11, 2020 @ 11:38 pm

    Michael Watts wrote:
    The people look nothing like actual people, either. It's a comic.

    Which is of course interesting in itself. It's possible to draw people such that they don't actually look much like real people, yet possess an unmistakable peoplishness.

  23. KeithB said,

    October 12, 2020 @ 8:29 am

    "Even more compelling, I think, is that humans can instantly recognize an animal as a dog, "

    There are some corner cases which challenge "dogness". For example:
    hyenas, thylacines and meercats.

  24. wanda said,

    October 12, 2020 @ 7:14 pm

    re dogness: My 2-year-old at some point called nearly every 4 legged hairy thing a "dog." He would also call ducks "albatross" because we would show him our Antarctica pictures and point out the albatross. I guess they sit on the ground the same way, and in pictures you can't tell the size difference.
    After I realized that, I put a lot more effort into teaching him to distinguish animals. But it is fascinating to see how his classifications of objects develop.

  25. JOHN C SWINDLE said,

    October 12, 2020 @ 10:00 pm

    My son spotted a hippopotamus on Oahu when he was little. We were going up the Pali Highway, he was in the back seat of the car, and there it was beside the road. We assured him that must have seen a wild boar. He wasn't having it. He'd been to the zoo. He'd seen pigs. He'd seen hippos. This was a hippo. He knew what hippos looked like. But we knew what lived in that forest. Who was right? We hadn't even seen it.

    My late mother grew up on the Plains, studied biology, and liked birds. One of my first inklings that she was getting dementia was when we were in a car in Western Kansas and saw a bird on a telephone pole. She wondered what it was. I speculated that it might be a juvenile Golden Eagle. She agreed that that must be right.

    Whoa. Her recognizing an unexpected bird at a glance would have been normal. Not recognizing it would have been fine too. But my recognizing an unexpected bird would have been far from normal. And her agreeing with my unlike identification? Something was wrong.

    And yet it might well have been a juvenile Golden Eagle.

  26. Julian said,

    October 12, 2020 @ 10:41 pm

    On dogness and Aristotelian types: among my toddler's first dozen words was 'ger' (='bird'). For a while he used it for almost anything that ran around on legs: dogs, cats, birds and people

  27. JTL said,

    October 13, 2020 @ 1:02 am

    "hyenas, thylacines and meercats"?

    The first two: yeah, sure. But meerkats?

  28. KeithB said,

    October 13, 2020 @ 8:19 am

    Yeah, meerkats are a stretch, but I originally had binturong, which was even more off base.

  29. David Marjanović said,

    October 13, 2020 @ 4:30 pm

    We have no trouble at all recognizing that a bulldog, a chihuahua, a golden retriever, and a St. Bernard are all the same kind of animal. That is mysterious.

    It's not at all mysterious, it's learned. Imagine someone used to wolves, red wolves, coyotes, and perhaps golden jackals and dholes, but has never seen a domestic dog – are they really going to classify all domestic dogs together?

  30. Andrew Usher said,

    October 13, 2020 @ 7:44 pm

    Well, if you have a category including all those wild canids, it's not so much a stretch to imagine putting all domestic dogs in it, as well – they do, indeed, all have 'dogness'. Finer distinctions, though, are not precise from observation alone.

    But of course, I agree in general that classification of organisms, especially in words, is not instinctive but learned, again: mostly not from personal experience. This is true of oaks as in the original – not many would put white oaks and red oaks together knowing nothing but their appearance. I would say that classification of trees is _notoriously_ learned – one hears poplar/aspen called 'birch' as long as it has white bark, though they aren't closely related.

    As for the term 'jizz', I don't use the word in any sense, but I don't know any alternative in the bird-watcher's use. It does seem kind of funny that there's no word for a rather common human ability.

    k_over_hbarc at yahoo.com

  31. John Swindle said,

    October 13, 2020 @ 8:31 pm

    Bears look like dogs, at least more than either one looks like an oak leaf. Anybody go for the clickbait about the family who brought home a puppy and got a big surprise? I'll bet it never turns out to be a baby oak tree.

  32. Dara Connolly said,

    October 14, 2020 @ 5:39 am

    Andrew Usher: "As for the term 'jizz', I don't use the word in any sense, but I don't know any alternative in the bird-watcher's use. It does seem kind of funny that there's no word for a rather common human ability."

    I think the word "gestalt" is useful here.

    More interesting than the ability to group all dogs together, perhaps, is the ability to distinguish a dog from a cat. We can all do this at a glance with near-100% specificity, even though we would be challenged to describe any feature that is visibly present in all dogs and absent in all cats, or vice versa.

  33. Alyssa said,

    October 14, 2020 @ 2:58 pm

    For "jizz", is it possible that it's just a mis-hearing or corruption of "gist"?

  34. Alex said,

    October 14, 2020 @ 3:31 pm

    When an archaeologist starts working on a new area or time period, they have to get the typologies in their head. You can do this with pictures or museum collections, but the best way is to sit down with someone who already works in the area and have them show you several dozen of each type of whatever it is, and have you sort through and type a batch, correct you, and do it again. ("No, that's not narrow enough to be a Savannah River point." "No, these are salt-glazed, and these are tin-glazed.") Experienced archaeologists can glance at a dirt-covered sherd or projectile from their area and tell you immediately what group made it and when.

    If you're working in an area that has no prior work (rarer these days), you have to train yourself to recognize the distinct types that are present and seriate them over time. If you're not good at it, you're really going to struggle, especially at reconnaissance. It does seem like one of the few areas where machine learning could actually be useful in the social sciences rather than just recapitulating biased inputs.

  35. Philip Taylor said,

    October 14, 2020 @ 3:42 pm

    Alyssa, Wikipedia says :

    Jizz — Etymology

    The term was first used in print in 1922, in Thomas Coward's "Country Diary" column for the Manchester Guardian of 6 December 1921 – the piece was subsequently included in his 1922 book Bird Haunts and Nature Memories.[7] He attributed it to "a west-coast Irishman",[7] and explained:[8]

    If we are walking on the road and see, far ahead, someone whom we recognise although we can neither distinguish features nor particular clothes, we may be certain that we are not mistaken; there is something in the carriage, the walk, the general appearance which is familiar; it is, in fact, the individual's jizz.

    Jeremy Greenwood concludes that the term was further popularised by its use by Miss E.I. Turner, "a popular author", in the journal Open Air in 1923.[9][10]

    There is a theory that it comes from the World War II RAF acronym GISS for "General Impression of Size and Shape (of an aircraft)"[11], but the use of the term in 1922 precludes that.[7][12] Another theory claims that jizz is a corruption of gestalt, a German word that roughly means form or shape.[13] Other possibilities include the word gist, or a contraction of just is. These theories were debunked by Jeremy Greenwood and his brother Julian in 2018.[7]

    [7], the debunking article, is Greenwood, Jeremy J.D.; Greenwood, Julian G. (May 2018). "The Origin of the Birdwatching Term "Jizz"". British Birds. 111 (5): 292-294. Sadly I do not have access to a copy.

  36. Andrew Usher said,

    October 16, 2020 @ 6:53 pm

    Well, that doesn't tell us where 'jizz' _does_ come from. I always thought the contraction of 'just is' the most plausible of the suggestions, as those words might well be used by someone trying to explain his identification; but that usage seems to cast doubt there, too.

  37. Philip Taylor said,

    October 17, 2020 @ 11:19 am

    Andrew — " that doesn't tell us where 'jizz' _does_ come from". No, but the paper cited in reference [7] does, one assumes. So all we need is a British ornithologist who is also a keen student of linguistics. Until such identifies him or herself, the best we can do is to consider the abstract —

    Abstract: The term ‘jizz’ was introduced to ornithology by T. A. Coward in 1921. There is no evidence for any other etymology. In this paper we explain the origin and spread of the word. We also examine some false explanations of its origin. In particular, it is simply untrue that GIS or GISS – ‘General Impression and Shape’ (and Size) – were ever used for aircraft recognition and were transferred to birds as ‘jizz’.

  38. Philip Taylor said,

    October 17, 2020 @ 11:32 am

    Jizz — a couple more links :

    T. A. Coward and the origins of ‘jizz’ ?

    Canberra Bird Notes, Vol.~41, No.~2, June 2016 (see article "THE ETYMOLOGY OF “JIZZ”’, REVISITED", by David McDonald).

  39. mg said,

    October 17, 2020 @ 2:58 pm

    I once went mushrooming when I was at a retreat with someone experienced at it. She warned that it was important to be aware that your expertise at knowing safe from unsafe wouldn't reliably translate to other geographic areas, and that a lot of accidental mushroom poisonings happened among immigrants who didn't realize that a safe mushroom in Russia looked just about the same as a poisonous one in New England or vice versa.

  40. Andrew Usher said,

    October 17, 2020 @ 5:09 pm

    If that article is right, Philip, and it was a misinterpretation by Coward 1922, then it is after all the same word as the 'American sense' of jizz=semen, which is known to come from the same jizz/jazz/jism/jasm, as does of course 'jazz' music. And this is said to be originally American, first attested 1842.

    That'd be astonishing! I can't find anything more, either, and it looks like the Greenwood and Greenwood article would have if there were; most likely it finds the same thing.

  41. Philip Taylor said,

    October 18, 2020 @ 2:45 am

    To be honest, Andrew, I am not convinced that Coward was mistaken at all — in coming to this conclusion, David McDonald writes :

    I now suggest that it is clear that Coward made a mistake in saying that, when the West Coast Irishman told him that it was possible to identify a particular bird by its ‘jizz’, the chap meant identifying it by the general impression that it gave. Rather, the Irishman meant that he could identify that particular species of bird because it had lots of ‘jizz’, that is, it was characteristically full of energy or exuberance.

    but that is not what occurred — the Irishman was not speaking of any one particular (species of) bird, but rather of "wild creatures" in general — the actual text reads as follows (link valid for 30 days from 18-Oct-2020; permalink available on request) :

    A West Coast Irishman was familiar with the wild creatures which dwelt on or visited his rocks and shores; at a glance he could name them, usually correctly, but if asked how he knew them would reply " By their 'jizz' ".

    I am therefore inclined to think that T A Coward's analysis was correct, and that David McDonald was mistaken, since whilst one species (or indeeed, many) may exhibit "jizz" in the "jazz" sense, all "the wild creatures which dwelt on or visited [the West Coast Irishman's] rocks and shores" could hardly be expected to do so, and, even if by some miracle they did, it would then be useless as an identification aid.

    I may just have to fork out £6 to purchase the Greenwood and Greenwood article !

  42. Rodger C said,

    October 18, 2020 @ 9:18 am

    A West Coast Irishman was familiar with the wild creatures which dwelt on or visited his rocks and shores; at a glance he could name them, usually correctly, but if asked how he knew them would reply " By their 'jizz' ".

    People who know their local birds etc. can often identify them at a distance by behavioral cues helped out by only the most general impression of appearance.

  43. Philip Taylor said,

    October 18, 2020 @ 10:39 am

    It is interesting to conjecture, Rodger, whether if one were to observe a tree-creeper climbing down a tree trunk, or a nut-hatch climbing up one, one might briefly confuse one with the other. I think it possible, but only if "the most general impression of appearance" were almost completely lacking.

  44. Andrew Usher said,

    October 18, 2020 @ 10:18 pm

    Yes, I'm inclined to agree – McDonald probably started with the idea that that was the only sense of 'jizz' recorded before, and that it existed in Ireland at the time, and having formed the 'obvious' conclusion didn't examine the original more carefully. In addition, if I were to find someone using an unfamiliar word to me like that, I'd surely want to ask what was meant by it – so perhaps Coward did, and some of what follows is a paraphrase of his answer. But it's clear he expanded on it, finding it a useful word because he had heard none other for the concept, and apparently did introduce it to the larger community; whether he even knew of the other 'jizz' I can't say, his spelling it like that is not evidence because that's how we naturally spell /z/ after a lax/short vowel in new words (I couldn't tell you why, though).

    As for the last I imagine that a reply from someone as lacking in bird-watching experience as I would not be useful.

