Chris is puzzled by these Google counts, for famous quotations with and without quotation marks flanking the search string:
Gone With The Wind
about 797,000 for "Frankly, my dear, I don't give a damn!"
about 163,000 for Frankly, my dear, I don't give a damn!
about 17,500,000 for "You talkin' to me?"
about 7,450,000 for You talkin' to me?
As he explains: " I discovered something weird. In some cases, the more restrictive, double-quoted query returned more hits that the unquoted query. A lot more. "
Here's a plausible theory about what's going on. Google stores (and indeed has published) counts of common high-order n-grams. Famous quotations are likely to include common n-grams, for large-ish values of n, and the quotation marks cause the search algorithm to check the n-grams lists and make some use of the counts. This method will perhaps yield somewhat truthful results, depending on details.
Without the counts, the basic approach (however modulated) is to look up the individual words, intersect the most highly-ranked hits for each of them, and extrapolate in some semi-clever way to what total count for the whole set would be expected. This method is certain to underestimate the counts for famous multi-word phrases, since such sequences are MUCH commoner than you would predict simply on the basis of their constituent unigram (or even bigram or trigram) counts.