Pete Warden, "Why you should never trust a data scientist", 7/18/2013:
The wonderful thing about being a data scientist is that I get all of the credibility of genuine science, with none of the irritating peer review or reproducibility worries. [...]
I’ve never ceased to be disturbed at how the inclusion of numbers and the mention of large data sets numbs criticism. The articles live in a strange purgatory between journalism, which most readers have a healthy skepticism towards, and science, where we sub-contract verification to other scientists and so trust the public output far more. If a sociologist tells you that people in Utah only have friends in Utah, you can follow a web of references and peer review to understand if she’s believable. If I, or somebody at a large tech company, tells you the same, there’s no way to check. The source data is proprietary, and in a lot of cases may not even exist any more in the same exact form as databases turn over, and users delete or update their information. Even other data scientists outside the team won’t be able to verify the results. The data scientists I know are honest people, but there’s no external checks in the system to keep them that way.