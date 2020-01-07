« previous post |

The current xkcd:

There's a long history of similar diagrams in language-related areas.

A century and a half ago, C.S. Peirce was fond of triangles, like this one:

And this one:

About a century ago, we get the "semantic triangle", from Charles Ogden and Ivor A. Richards, The Meaning of Meaning, 1923:

Here's a three-dimensional "semantic differential" subspace, based on the ideas in Charles Osgood, "The Nature and Measurement of Meaning", 1952:

Another one:

A similar set of ideas lie behind Valence-Arousal theories of emotion:

Joe Kruskal's non-metric Multi-Dimensional Scaling (and similar dimensionality-reduction techniques) were in large part an attempt to extend Osgood's ideas about using factor analysis to learn semantic-differential dimensions from observational data of various kinds. Doug Biber used MDS to learn the dimensions of register/style/genre:

"Latent Semantic Analysis", originally developed by Tom Landauer, Sue Dumais and others in the late 1980s, has a similar historical relationship to Ogden's Semantic Differential ideas, and similarly learns to place words in a multi-dimensional space based on Singular Value Decomposition of a term-by-document matrix. For some pictures, see Landauer et al., "From paragraph to graph: Latent semantic analysis for information visualization", PNAS 2004. A relevant quote:

[W]e conjecture that verbal meaning is irreducibly high dimensional. Thus, the value of automatic reductions to two or three best dimensions may be inherently limited; although they may be valuable for some purposes, they must often provide only an impoverished and possibly misleading impression of the relations in a dataset. Different researchers and scholars are often interested in different aspects of articles, only some of which may have been indexed, key-worded, the object of citation, or shown in a particular view. The alternative we have explored here is a combination of measuring similarity of the entire content of articles with high dimensional visualizations that support search for projections that are of special interest to the user. […]

Despite decades of highly creative and sophisticated innovation, and a plethora of claims for obvious superiority of the visualization approach, we do not see visual maps of verbal information in popular and effective use. It is, of course, possible that visualizing verbal information is in large part just an appealing bad idea. A more optimistic view is that the application of more user testing to understand what does and doesn't help people do what, will steer innovations in more effective directions.

Of course, more recent "word embedding" methods are variations on the same theme as LSA, and thus also descend from Osgood and other mid-20th-century psychologists.

