{"id":8576,"date":"2013-11-19T01:56:58","date_gmt":"2013-11-19T06:56:58","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=8576"},"modified":"2013-11-19T01:56:58","modified_gmt":"2013-11-19T06:56:58","slug":"more-on-the-statistics-of-real-estate-listings","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=8576","title":{"rendered":"More on the statistics of real-estate listings"},"content":{"rendered":"<p>Early last summer, an inquiry from Sanette Tanaka at the WSJ led me to do a Breakfast Experiment\u2122 on the relationship between the language of real-estate listings and the price of the associated properties (\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=4681\" target=\"_blank\">Long is good, good is bad, nice is worse, and ! is questionable<\/a>\", 6\/12\/2013; \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=4686\" target=\"_blank\">Significant (?) relationships everywhere<\/a>\", 6\/14\/2013; \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=4946\" target=\"_blank\">City of the big disjunctions<\/a>\", 6\/20\/2013).<\/p>\n<p><!--more--><\/p>\n<p>Since then, Bob Stine and Dean Foster (Wharton Statistics) and I have done a more serious investigation in this area. Bob has put a draft paper up on his <a href=\"http:\/\/www-stat.wharton.upenn.edu\/~stine\/\" target=\"_blank\">web site<\/a> (Foster, Liberman, and Stine, \"<a href=\"http:\/\/www-stat.wharton.upenn.edu\/~stine\/research\/regressor.pdf\" target=\"_blank\">Featurizing text: Converting text into predictors for regression analysis<\/a>\") along with a <a href=\"http:\/\/www-stat.wharton.upenn.edu\/~stine\/research\/regressor_slides.pdf\" target=\"_blank\">set of slides<\/a>. There's also a video of Bob giving a <a href=\"http:\/\/datamining.ws.gc.cuny.edu\/2013\/10\/31\/robert-stine-seminar\/\" target=\"_blank\">talk at CUNY last month<\/a> about this work.<\/p>\n<p>I don't have time this morning for a longer explanation (it's morning in Paris, where I am at the moment), but here's how Bob headlines the paper:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">This draft manuscript (really more of a working paper) describes fast methods for the construction of numerical regressors from text using spectral methods related to the singular value decomposition (SVD). An example uses these methods to build regression models for the price of Chicago real estate using nothing but the text of a property listing. Topic models (LDA) provide some explanation for why these methods work so well as they do. For example, our model for real estate explains some 70% of the variation in prices using just the text of the listing with no attempt to use location or related demographics.<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Early last summer, an inquiry from Sanette Tanaka at the WSJ led me to do a Breakfast Experiment\u2122 on the relationship between the language of real-estate listings and the price of the associated properties (\"Long is good, good is bad, nice is worse, and ! is questionable\", 6\/12\/2013; \"Significant (?) relationships everywhere\", 6\/14\/2013; \"City of [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[60],"tags":[],"class_list":["post-8576","post","type-post","status-publish","format-standard","hentry","category-computational-linguistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/8576","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8576"}],"version-history":[{"count":1,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/8576\/revisions"}],"predecessor-version":[{"id":8577,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/8576\/revisions\/8577"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8576"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8576"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8576"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}