{"id":62325,"date":"2024-01-24T07:27:14","date_gmt":"2024-01-24T12:27:14","guid":{"rendered":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=62325"},"modified":"2024-01-24T07:40:05","modified_gmt":"2024-01-24T12:40:05","slug":"back-to-bacon","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=62325","title":{"rendered":"Back to Bacon"},"content":{"rendered":"<p>The implicit slogan of language-model research is <a href=\"https:\/\/en.wikipedia.org\/wiki\/John_Rupert_Firth\" target=\"_blank\" rel=\"noopener\">J.R. Firth<\/a>'s dictum, \"You shall know a word by the company it keeps\", from his 1957 paper \"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/Firth1957.pdf\" target=\"_blank\" rel=\"noopener\">A synopsis of linguistic theory, 1930-1955<\/a>\":<br \/>\n<a href=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/FirthWordCompany.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/FirthWordCompany.png\" width=\"490\" \/><\/a><br \/>\n<!--more--><br \/>\nAs <a href=\"https:\/\/en.wikipedia.org\/wiki\/John_Rupert_Firth\" target=\"_blank\" rel=\"noopener\">the Wikipedia article<\/a> explains,<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000080;\">His theory that \"you shall know a word by the company it keeps\" \/ \"a word is characterized by the company it keeps\" inspired works on <a style=\"color: #000080;\" href=\"https:\/\/en.wikipedia.org\/wiki\/Word_embedding\" target=\"_blank\" rel=\"noopener\"><u>word embedding<\/u><\/a> hence add <em>[sic]<\/em> a major impact in <a style=\"color: #000080;\" href=\"https:\/\/en.wikipedia.org\/wiki\/Natural_language_processing\" target=\"_blank\" rel=\"noopener\">natural language processing<\/a>. Many techniques were designed to build dense vectors representing words semantics based on their neighbors (e.g. <a style=\"color: #000080;\" href=\"https:\/\/en.wikipedia.org\/wiki\/Word2vec\" target=\"_blank\" rel=\"noopener\"><u>Word2vec<\/u><\/a>, <a style=\"color: #000080;\" href=\"https:\/\/en.wikipedia.org\/wiki\/GloVe\" target=\"_blank\" rel=\"noopener\"><u>GloVe<\/u><\/a>).<\/span><\/p>\n<p>Firth's 1957 paragraph footnotes Wittgenstein's <em>Philosophical Investigations<\/em>, but the cited passages deal with more general questions about the nature of meaning, based on analogies to games and so on. The phrase \"you shall know a word by the company it keeps\" seems more strikingly reminiscent of the old legal maxim \"<a href=\"https:\/\/www.merriam-webster.com\/legal\/noscitur%20a%20sociis\" target=\"_blank\" rel=\"noopener\"><em>noscitur a sociis<\/em><\/a>\". Thus from Broom's 1845 <a href=\"https:\/\/www.google.com\/books\/edition\/Selection_of_Legal_Maxims\/-KxBAAAAYAAJ?hl=en&amp;gbpv=1&amp;dq=noscitur+a+sociis&amp;pg=PA192&amp;printsec=frontcover\" target=\"_blank\" rel=\"noopener\"><em>Legal Maxims<\/em><\/a>:<\/p>\n<p><a href=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/BroomNosciturMaxim.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/BroomNosciturMaxim.png\" width=\"490\" \/><\/a><\/p>\n<p>That's\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Francis_Bacon\" target=\"_blank\" rel=\"noopener\">Sir Francis Bacon<\/a>, the father of empiricism&#8230;<\/p>\n<p>The same idea has been taken up many times since, e.g. in Maxwell's 1875 <a href=\"https:\/\/books.google.com\/books?id=OkkyAAAAIAAJ\" target=\"_blank\" rel=\"noopener\"><i>On the Interpretation of Statutes<\/i><\/a>: \"When two or more words, susceptible of analogous meaning, are coupled together, noscuntur a sociis; they are understood to be used in their cognate sense. They take, as it were, their colour from each other.\"<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The implicit slogan of language-model research is J.R. Firth's dictum, \"You shall know a word by the company it keeps\", from his 1957 paper \"A synopsis of linguistic theory, 1930-1955\":<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[60],"tags":[],"class_list":["post-62325","post","type-post","status-publish","format-standard","hentry","category-computational-linguistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/62325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=62325"}],"version-history":[{"count":7,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/62325\/revisions"}],"predecessor-version":[{"id":62332,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/62325\/revisions\/62332"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=62325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=62325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=62325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}