30 years ago, Don Hindle explored the idea of calculating semantic similarity on the basis of predicate-argument relations in text corpora, and in the context of that work, I remember him noting that we tend to purchase wine but buy beer. He didn't have a lot of evidence for that insight, since he was working with a mere six-million-word corpus of Associated Press news stories, in which the available counts were small:
|wine
|beer
|purchase
|1
|0
|buy
|0
|3
So for today's lecture on semantics for ling001, I thought I'd check the counts in one of the larger collections available today, as an example of the weaker types of connotational meaning.
In the 14-billion-word iWeb corpus, the counts are:
|wine
|art
|diamonds
|beer
|popcorn
|shoes
|[purchase]
|333
|379
|191
|216
|36
|380
|[buy]
|1223
|1491
|717
|1329
|202
|2250
The square brackets mean that I searched for all the inflected forms of the lemma, e.g. purchase, purchases, purchased, purchasing. The less formal term buy is overall more common in this less-formal set off sources, but the statistical tendency is still strongly there, as shown in the [buy]/[purchase] ratio for some representative words:
Cervantes said,
October 19, 2020 @ 2:21 pm
I wonder if this isn't related to price. I would expect people might buy cheap wine and purchase expensive shoes. Expensive popcorn and cheap diamonds don't really exist, though . . .