Radial dendrograms
« previous post | next post »
From Sarah Gao and Andrew Gao, "On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models", arxiv.org 7/19/2023:
That's not a vinyl — it's a "radial dendrogram" — showing the evolutionary tree of nearly 6,000 Large Language Models posted at Hugging Face. Zeroing in on one quadrant, so you can read the labels:
(If it's not clear to you what a "dendrogram" is, check out the Wikipedia entry…)
Radial (and non-radial) dendrogram graphics can be created in many programming languages, e.g. R, Python, Matlab, …
The whole "historical tree" idea started in the late 18th century as a story about the history of languages. It was then copied explicitly by Darwin ,in extension to ideas about the "origin of species" (See "Gall in the family", 11/7/2003). And since then, it's been applied in many other areas, a few of which are illustrated below.
Here's an example applied to the evolutionary tree of Austronesian languages, from A. Bouchard-Côté et al., "Automated reconstruction of ancient languages using probabilistic models of sound change", PNAS 2013.
Further afield, here's the history of the banjo — Ethan Fulwood, "Quantitative similarities between the banjo and a diverse collection of West African lutes", Social Sciences Communications 2022:
And here's an evolutionary tree of programming languages (from various places on the web, e.g. here):
In all of these areas, of course, the true history is not entirely (or even mostly) tree-like. There are always lots of "lateral" influences that violate strict tree-structured inheritance. A terrific set of discussions, in the linguistic-history area, can be found in a special issue of the Journal of Historical Linguistics: Siva Kalyan, Alexandre François and Harald Hammarström, Eds., "Understanding language genealogy: Alternatives to the tree model", 2019.
Here's the start of that issue's introduction:
Ever since it was popularized by August Schleicher (1853, 1873), the family-tree model has been the dominant paradigm for representing historical relations among the languages in a family. There have been many other proposals for representing language histories: for example, Johannes Schmidt’s (1872) “Wave Model” (as illustrated, e.g., in Schrader 1883:99 and Anttila 1989:305); Southworth’s (1964) “tree-envelopes” (which seem to predate the “species trees” of phylogeography, e.g. Goodman et al. 1979; Maddison 1997); Hock’s (1991: 452) “‘truncated octopus’-like tree”; and, more recently, NeighborNet (Hurles et al. 2003; Bryant et al. 2005) and Historical Glottometry (Kalyan & François 2018). However, none of these representations reaches the simplicity, formalization, or historical interpretability of the family tree model.
bks said,
July 26, 2023 @ 5:00 pm
Browse the tree of life:
https://www.onezoom.org/
Richard Kettlewell said,
July 30, 2023 @ 11:24 am
I'm not sure what the programming language dendrogam represents, but it's certainly nothing to do with the evolutionary history of programming languages.