Language Log

Long is good, good is bad, nice is worse, and ! is questionable

June 12, 2013 @ 8:23 am · Filed by Mark Liberman under Computational linguistics

Sanette Tanaka, "Fancy Real-Estate Listing, Fancier Verbiage", WSJ 6/6/2013:

Savvy real-estate agents know it's not just what you say. It's how long it takes you to say it.

More-expensive homes go hand-in-hand with longer real-estate agents' remarks—the language written by the agent that supplements the house description and photos in a listing. Agents use a median 250 characters for homes listed under $100,000, according to an analysis for The Wall Street Journal by real-estate listings company Zillow. For homes priced over $1 million, they go nearly twice as long, with a median 487 characters. (That's about the length of this paragraph.)

"Generally, what you find is that regardless of the region, the more expensive the home is, the more characters are used to describe that home," says Stan Humphries, chief economist at Zillow.

Here's a plot of the relationship between description-length and price, from the cited article:

Ms. Tanaka contacted me while she was researching this story, and I sent her a link to "The quality of quantity", 4/24/2012, in which I noted a similar relationship in the cases of wine reviews and letters of recommendation:

The longer it is, the higher the rating:

We're talking about the length of wine reviews, measured in words, and the numerical rating given to the associated wine. (Well, actually, the length of the reviews is measured in terms of the output of a tokenizer that sets off punctuation as well as alphanumeric strings…).

Unfortunately, wine reviews and letters of recommendation were too far off topic, and so my contribution to the article ended up being some speculations on the generation and interpretation of real-estate listings, a topic that I had never specifically studied:

Writing long for pricier homes has become standard practice in real estate, Mr. Liberman says. In fact, a short remark or a lack of hyped-up adjectives could suggest that there's something wrong with the home, he adds. "Given that all the descriptions of better properties are full of these empty-enthusiasm words, it might be interpreted by readers as an indication of problems if they're absent," he says.

Given this published commitment to empirically-unsupported common sense, I felt obliged to look into the facts. So with the kind permission of the PR department at trulia.com, I've downloaded their listings for Philadelphia, Boston, Los Angeles, New York City, Las Vegas, Miami-Dade, Denver, and Chicago, a total of 52565 descriptions of properties for sale.

It's taken my computer a week, since their server hands out pages at a slow pace, saving me the trouble of slowing the process down on my end so as not to abuse their bandwidth– but eight cities is enough to support a Breakfast Experiment™ more or less on the topic of the WSJ story.

To start with, this dataset from Trulia qualitatively replicates the findings from Zillow:

I was worried that part of the effect might be due to the fact that lower-priced listings are more likely to lack a description entirely — and indeed, 12.3% of the listings in the $100-200K price range are descriptionless, compared to just 2.4% of the listings in the $1M+ range. But this doesn't affect the trend much — the red line in the plot above gives the results for listings that actually contain a description.

However, when things are calculated this way, the effects look very different in different cities. Here's a plot showing Miami (M), Chicago (C), New York City (N), and Philadelphia (P):

This is partly but not entirely due to the large differences in price distributions among the cities — for instance:

Anyhow, I'll leave for another day the problem of separating region effects from price effects, and turn briefly to the issue of what description-writers are doing with those extra characters. Returning to common sense, it seems likely that some of the extra length is due to more expensive properties having a larger number of positive features to describe, while some of it is due to describing more expensive things at greater length, for example by using more empirically-empty evaluative adjectives like "stunning" or "spectacular".

And indeed there's evidence for both of these effects. More expensive listings are more likely to talk about decks, offices, fireplaces, and so on, because more expensive properties are more likely to have those features. Thus the word "fireplaces" is ten times more likely to occur in the pricier half of each city's listings than in the cheaper half.

And there are many evaluative adjectives that are more likely to be used in describing more expensive properties, for example:

	Rate per MW in top 50%	Rate per MW in bottom 50%	Ratio
exquisite	235	53	4.5
dramatic	247	57	4.3
soaring	215	54	4.0
expansive	361	97	3.7
sophisticated	149	40	3.7
luxurious	512	143	3.6
lush	183	60	3.0
breathtaking	256	90	2.8
prestigious	236	90	2.6

On the other hand, there are some evaluative adjectives, even adjectives that are positively-evaluated in general, that go in the opposite direction:

	Rate per MW in top 50%	Rate per MW in bottom 50%	Ratio
cute	9	75	0.12
nice	173	1196	0.14
good	205	819	0.25
clean	100	292	0.34
convenient	147	391	0.38
fresh	65	164	0.46
lovely	301	551	0.55
excellent	518	873	0.59
charming	262	419	0.63

In the tables above, the listings are divided into top and bottom price quantiles city by city, rather than across the board. But there seem to be some overall price effects as well, so that the frequency of price-associated terms varies across cities in a way that's partly explained by the city-to-city differences in price distributions. Thus the exclamation point is more likely to be used in lower-priced listings — if we calculate the rates based on city-by-city price quantiles, we find that the rate in the top half of listings is 7124 per MW, while the rate in the bottom half is 12310 per MW.

But here's a scatterplot showing the relationship between city-wise exclamation-point frequency and log city-wise median listing price:

Here's what happens if we break the cities up into their pricewise top half (NYC1, LA1, …) and bottom half (NYC2, LA2, …):

I believe that both the absolute (across-city) price and the relative (within-city) price are playing a role in these distributions, but proving that will have to wait for another breakfast time.

June 12, 2013 @ 8:23 am · Filed by Mark Liberman under Computational linguistics

Permalink

16 Comments

Bobbie said,

June 12, 2013 @ 9:11 am

Is it too simplistic to note that the words used to describe the higher-end expensive properties tend to be longer and have more syllables than words used to describe those is the lower half?

[(myl) I think that this is generally true — it's certainly true, on average, of the examples in my list, which I created by writing down all the relevant evaluative adjectives that came to mind in a minute or two (70 of them), sorting them by the ratio of counts in more- and less-expensive properties (dividing listings by each city's distribution of prices), and displaying the top and bottom of the list.]

I have been in real estate for over 20 years. There are certain descriptive words that have connotations in the real estate market. For instance, cozy is a "nicer" word than small. Really awful properties are often presented as "opportunities" or "fixer-uppers", never as the smelly nasty dumps that they are!
CM said,

June 12, 2013 @ 9:46 am

I believe there's a study about journal articles in psychology that shows that the longer the article title, the more often the article is cited. Can't seem to find it though.
GeorgeW said,

June 12, 2013 @ 9:59 am

@Bobbie: I would point out that these evaluations are all relative. Your "smelly, nasty dump" could well be considered exquisite by a homeless person and an "exquisite," million-dollar home a "smelly, nasty dump" to Donald Trump.
Theodore said,

June 12, 2013 @ 11:36 am

Sort of a corollary to the "deck, fierplace, etc." phenomenon: for the hypothetical adjectiveless situation where descriptions consist only of a list of the rooms contained in each property (e.g. "kitchen, bedroom, bedroom, bath"), descriptions will end up including more words for larger (and likely more expensive on average) homes.

I wonder about the character count graph by city: Do regional realtors' boards have a "house style" for descriptions that recommends a maximum word count, causing the graphs to level off in the middle as they all do to some extent, with the superstar agent allowed to bend the rules to close the deal on a mansion?
Jonathan said,

June 12, 2013 @ 1:09 pm

Holding the city constant, aren't more expensive homes larger homes? And don't larger homes have more things to describe than smaller homes?
Chris said,

June 12, 2013 @ 2:25 pm

I wonder whether terms like cute, nice, good, clean and convenient might appeal to a different demographic group than terms like exquisite or luxury — and coincidentally or not, to a demographic group that has much less money to spend on housing.

Or perhaps some of those terms are perceived as damning with faint praise, especially "cute", "nice", and "good." If that's the best you can say about a house….

Also, I'd expect that in a luxury-homes market, it might be taken for granted that all homes are "clean" and "fresh", and therefore these would not be desirable things to have in an ad. They might even be regarded suspiciously.
Sister_Ray said,

June 12, 2013 @ 2:45 pm

Just like with holiday brochures (lively hotel = you can't sleep because people party all night in the hallways) I suspect that some of these 'positive' adjectives that are more often found in ads for the cheaper places are code for something not so desireable:

Charming = has a certain appeal but has a big flaw, like absolutely impractical lay-out of the rooms/mice in the attic/rotting porch.
Cute sounds like a euphemism for tiny.
Good = functional but bare.
Nice = previous owner had weird ideas about decorating and painted everything in limegreen and mauve.
Convenient sounds like inner-city place close to the station or right next to a big road; great for commuting but terribly noisy.
Lovely = the place is a dump but there is one redeeming feature, like a lovely view/kitchen etc.
Excellent = excellent facilities: there is a communal dryer/washer.
Clean and fresh = the neighbourhood is really shitty, but this place is slightly better than the rest. It's old but at least it doesn't smell.

On the other hand: I don't know anything about American real estate.
J.W. Brewer said,

June 12, 2013 @ 3:25 pm

In the old pre-internet days when real estate was primarily advertised via classifieds in a special section of the Sunday paper, you'd think agents listing more expensive properties would be more likely to splurge on larger ads (in column inch terms) and thus had more words to play with (unless they were just going to use a larger font). This could have created habits that outlasted the original constraint that gave rise to them. (At the other extreme, ads for apartment rentals in the NY Times were sufficiently expensive proportionate to the value of the transaction that an extreme character-saving shorthand was developed like "rvr vu" for "river view.") My favorite euphemism in this genre is "jewel-box" meaning "really really small; definitely smaller than merely 'cozy.'"
J.W. Brewer said,

June 12, 2013 @ 3:40 pm

There was a realtor (BrEng "estate agent," I think) in London who floruit circa1960 named Roy Brooks who became a minor celebrity for writing ads so much at variance with the blandly-positive conventions of the genre that people looked forward to reading the relevant section of the newspaper just to be amused by his prose, whether or not they were in the market to buy or rent. There's an anthology of his best work called "Brothel In Pimlico" which is referenced (with the titular ad set forth in full) here: http://stuff4restaurants.com/blog3/2007/10/25/brothel-in-pimlico-the-best-book-on-how-to-write-real-estate-advertising/. It's hard to find in the U.S. but not impossible (i.e. someone managed to get me a copy as a gift a year or two back).
Mae said,

June 12, 2013 @ 5:39 pm

Short descriptions can be effective also — consider this current listing from Sotheby's in Aspen, Colorado:
"Situated on 504 totally private acres
Main house, guest house, pastures, & views
Barn, horse stalls, riding arena, & washing stall
Quintessential outdoor lifestyle
$50,000,000 Furnished"
[Punctuation and line breaks are as in original text; this is from a printed ad, but the internet version is more detailed]
Gregory Kusnick said,

June 12, 2013 @ 7:16 pm

I wonder to what extent buyer bandwidth plays a role. A high-end buyer has fewer properties to choose from, and therefore more time to spend on each. A middle-class buyer has more choices, and (I'm guessing) a correspondingly lower TL;DR threshold.
zythophile said,

June 13, 2013 @ 8:23 am

So is the leap in the number of words at $1m+ values an artefact of some sort or is there a good reason for it?
Kenny Easwaran said,

June 13, 2013 @ 11:12 am

Zythophile – I wonder if this might be an artifact of lumping all $1m+ prices together. If that group were broken up into bands of $100,000 width like the lower ones are, maybe we'd see a more continuous distribution.

Of course, that will probably depend on the city – in a place like Las Vegas, probably anything at $2m or above will be pretty much the top of the market, while in Los Angeles, or especially New York, you don't get anywhere near the top of the market until you're in the 8 digit range.
J.W. Brewer said,

June 13, 2013 @ 12:42 pm

There's also the possibility that different realtors/copywriters specialize in different segments of the market and (for whatever industry culture reasons) have different approaches. What might be interesting would be to find a (hopefully representative) group of individual agents who list properties over a fairly wide range of price points and see if their individual practices vary from listing to listing along this terseness-to-prolixity dimension in a price-sensitive way.
Ken Brown said,

June 14, 2013 @ 5:10 pm

Why would it be surprising that someone selling things puts more effort into the jobs that make them more money? It might just be a rational allocation of resources.

Would it be surprising if falcons were willing to work harder to catch a duck than a sparrow?

Someone has to put some paid time into writing those descriptions. Or more likely assembling them from boilerplate.
Audrey W. said,

June 28, 2013 @ 10:23 am

Some of the 'lower-end' words are actually useful instead of empty. Having spent a lot of time in and around Philadelphia looking for inexpensive apartments, 'clean' is hard to find. 'Fresh' implies it doesn't smell. 'Convenient' means public transportation or grocery stores are nearby. 'Charming' means it has some horrible flaw (it may have no heating or air conditioning, but that might 'add to its 1850s charm' or something similar. 'Nice' means livable but bland.

Clean and fresh are a given in luxury housing, though, and maybe convenience doesn't matter if you have a car or take taxis everywhere. For a million dollars, you don't want 'nice.'

The 'high-end' adjectives are mostly 'status' words. While the poorer people have to focus on the basics of a decent apartment, the richer people are thinking about how other people will perceive them, or so the real estate agents think. This looks like Maslow's Hierarchy applied to real estate.

RSS feed for comments on this post

Long is good, good is bad, nice is worse, and ! is questionable

16 Comments

Bobbie said,

CM said,

GeorgeW said,

Theodore said,

Jonathan said,

Chris said,

Sister_Ray said,

J.W. Brewer said,

J.W. Brewer said,

Mae said,

Gregory Kusnick said,

zythophile said,

Kenny Easwaran said,

J.W. Brewer said,

Ken Brown said,

Audrey W. said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta