Radial dendrograms

« previous post | next post »

From Sarah Gao and Andrew Gao, "On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models", arxiv.org 7/19/2023:

That's not a vinyl — it's a "radial dendrogram" — showing the evolutionary tree of nearly 6,000 Large Language Models posted at Hugging Face. Zeroing in on one quadrant, so you can read the labels:

(If it's not clear to you what a "dendrogram" is, check out the Wikipedia entry…)

Radial (and non-radial) dendrogram graphics can be created in many programming languages, e.g. R, Python, Matlab, …

The whole "historical tree" idea started in the late 18th century as a story about the history of languages. It was then copied explicitly by Darwin ,in extension to ideas about the "origin of species" (See "Gall in the family", 11/7/2003). And since then, it's been applied in many other areas, a few of which are illustrated below.

Here's an example applied to the evolutionary tree of Austronesian languages, from A. Bouchard-Côté et al., "Automated reconstruction of ancient languages using probabilistic models of sound change", PNAS 2013.

Further afield, here's the history of the banjo — Ethan Fulwood, "Quantitative similarities between the banjo and a diverse collection of West African lutes", Social Sciences Communications 2022:

And here's an evolutionary tree of programming languages (from various places on the web, e.g. here):

In all of these areas, of course, the true history is not entirely (or even mostly) tree-like. There are always lots of "lateral" influences that violate strict tree-structured inheritance. A terrific set of discussions, in the linguistic-history area, can be found in a special issue of the Journal of Historical Linguistics: Siva Kalyan, Alexandre François and Harald Hammarström, Eds., "Understanding language genealogy: Alternatives to the tree model", 2019.

Here's the start of that issue's introduction:

Ever since it was popularized by August Schleicher (1853, 1873), the family-tree model has been the dominant paradigm for representing historical relations among the languages in a family. There have been many other proposals for representing language histories: for example, Johannes Schmidt’s (1872) “Wave Model” (as illustrated, e.g., in Schrader 1883:99 and Anttila 1989:305); Southworth’s (1964) “tree-envelopes” (which seem to predate the “species trees” of phylogeography, e.g. Goodman et al. 1979; Maddison 1997); Hock’s (1991: 452) “‘truncated octopus’-like tree”; and, more recently, NeighborNet (Hurles et al. 2003; Bryant et al. 2005) and Historical Glottometry (Kalyan & François 2018). However, none of these representations reaches the simplicity, formalization, or historical interpretability of the family tree model.

 



2 Comments

  1. bks said,

    July 26, 2023 @ 5:00 pm

    Browse the tree of life:
    https://www.onezoom.org/

  2. Richard Kettlewell said,

    July 30, 2023 @ 11:24 am

    I'm not sure what the programming language dendrogam represents, but it's certainly nothing to do with the evolutionary history of programming languages.

RSS feed for comments on this post