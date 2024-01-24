« previous post |

The implicit slogan of language-model research is J.R. Firth's dictum, "You shall know a word by the company it keeps", from his 1957 paper "A synopsis of linguistic theory, 1930-1955":





As the Wikipedia article explains,

His theory that "you shall know a word by the company it keeps" / "a word is characterized by the company it keeps" inspired works on word embedding hence add [sic] a major impact in natural language processing. Many techniques were designed to build dense vectors representing words semantics based on their neighbors (e.g. Word2vec , GloVe ).

Firth's 1957 paragraph footnotes Wittgenstein's Philosophical Investigations, but the cited passages deal with more general questions about the nature of meaning, based on analogies to games and so on. The phrase "you shall know a word by the company it keeps" seems more strikingly reminiscent of the old legal maxim "noscitur a sociis". Thus from Broom's 1845 Legal Maxims:

That's Sir Francis Bacon, the father of empiricism…

The same idea has been taken up many times since, e.g. in Maxwell's 1875 On the Interpretation of Statutes: "When two or more words, susceptible of analogous meaning, are coupled together, noscuntur a sociis; they are understood to be used in their cognate sense. They take, as it were, their colour from each other."

