Scrubbing for information

« previous post | next post »

Rob Cox and Anthony Currie, "Glencore I.P.O. Mimics Blackstone and Draws Skeptics", NYT 5/3/2011:

Is Glencore the new Blackstone? It has become a theme from Wall Street to the City and beyond that the commodity trader’s planned $12 billion stock offering signals the top of its industry’s cycle, just as Blackstone’s did for private equity. But investors should watch for other similarities when scrubbing Glencore’s prospectus, due out Wednesday.

Say what?

The OED's possibly-relevant glosses for scrub as a verb are "To clean (esp. a floor, wood, etc.) by rubbing with a hard brush and water. Also fig."; "To rub with something bristly"; "To wash (usually with a brush) and disinfect the hands and forearms prior to performing or assisting at a surgical operation"; "Of a horse-rider: to rub the arms and legs urgently upon a horse's neck and flanks to urge the horse to move faster"; "To cancel, scrap, call off; to eliminate, erase; to reject, dismiss"; "To manage with difficulty, to ‘scrape’ along"; "To treat (a material, esp. a gas or vapour) so as to remove impurities, usu. by bringing it into contact with a liquid".

There are plenty of other kinds of metaphorical scrubbing, including things that happen to skidding tires and rock guitars. In the example quoted above,  Cox and Currie seem to have meant scrub to mean something like "examine carefully", which is not a sense that I've been able to find in any dictionaries.  And in connection with business information, scrub seems more likely to evoke metaphorical extensions of meanings like "disinfect", "remove impurities", etc., which presumably is not at all what they wanted.

But scrub = "examine carefully" is Out There. Thus Mark Mazzetti and Scott Shane, "Data Show Bin Laden Plots; C.I.A. Hid Near Raided House",NYT 5/5/2011:

The aggressive effort across the intelligence community to translate and analyze the documents seized from the hide-out has as its top priority discovering any clues about terrorist attacks that might be in the works. Intelligence analysts also were scrubbing the files for any information that might lead to identifying the location of Al Qaeda’s surviving leadership.

I wonder whether this use is borrowed by analogy from scour, another sc- word that can mean either "To cleanse or polish … by hard rubbing with some detergent substance" or "to traverse in quest of something", "To run over in the mind, with the eye, etc.".

To my surprise, the OED informs us that these represent the historical conflation of two different words, scour v.1 with the core meaning "To move about hastily or energetically, esp. to range about in search of something", and scour v.2 with the core meaning "To cleanse or polish by hard rubbing":

In some of the senses explained below there may have been coalescence of words of identical form but etymologically unconnected; it is difficult in some uses to distinguish between this verb and scour v.2, by association with which its sense-development has certainly been influenced.

For another example of such historical conflation, see "Counting poles", 2/24/2004.

Ron Stack, who sent me the Mazzetti & Shane link, suggested a connection to "the usage in audio editing in which a track can be scrubbed (think of manually moving tape reels back and forth) to find a specific location", and linked to the documentation for the 'Scrub tool" in Soundtrack Pro 3:

Scrubbing an audio file lets you hear the audio at the playhead position as you drag the playhead so you can quickly find a particular sound or event in the audio file. The Soundtrack Pro Scrub tool provides detailed scrubbing that realistically approximates the “rock-the-reels” scrubbing on analog tape decks.



24 Comments

  1. Shoe said,

    May 7, 2011 @ 7:44 am

    Scrubbing is the term given to moving the timeline of a video to get a quick preview of the contents, as in this definition from Microsoft:

    http://msdn.microsoft.com/en-us/library/cc295169.aspx

    It is probable that scrubbing as previewing is the intended meaning in the passages quoted above.

  2. Chris said,

    May 7, 2011 @ 8:40 am

    The Mazzetti use is particularly odd since scrub in the intelligence domain already has a common and highly salient use meaning redact. Could there be intereference from scour

  3. Martin J Ball said,

    May 7, 2011 @ 9:00 am

    Frankly, I would suggest the entire quote is totally obscure to most people :)

  4. Marc Cenedella said,

    May 7, 2011 @ 9:01 am

    Very interesting catch! I use this expression in my profession, so I had the pleasant jolt of recognition when you flagged it as an unusual formulation.

    The phrase derives from metaphorical expressions surrounding the handling and processing of data. I think you may be on to something with “scour”.

    In order for data to be useful, it must be "clean". Over time, the concept of "clean" has spread from "being free from simple mechanical errors" such as typos or misplaced characters, to the logical, such that "values are consistent with the characteristics of data expected in the field", for example, if a zipcode is required all values need to be valid zipcodes; to the normative, such as a particular set of data exhibiting the characteristic of being complete or whole.

    These activities are captured in idiomatic expressions "to clean the data", or "to scrub the data."

    From this, the concept “to clean the data” broadened to include the concept of preparing data for analysis, and thence to manipulating data in such a manner as to make it useful for further calculation, analysis or understanding.

    One might say:
    "Did you scrub the data?", or
    "Did you scrub the data files?", or
    "Did you scrub the stuff we got from marketing?", or
    "Did you scrub through the materials?", or
    "Did you scrub the prospectus?"

    What is happening when the phrase changes from "scrub the x" to "scrub through the x"? Is that simply another formulation or does the addition of through indicate a semantic change?

    Comparing to “did you scrub through the materials?” to “did you scour through the materials?”, I think it may be the case that scour, in this case, has acquired the specialized meaning of “looking for a specific instance” or “looking for examples of a particular type”, and can therefore no longer mean a generalized manipulation and / or consumption of data or a document.

    One might say
    “I scoured the entire prospectus looking for a mention of the patent lawsuit”, or
    “I scoured the document, there’s nothing in there on foreign subsidiaries”, but not
    “I scoured the document to get a complete understanding of his proposal.”

    So perhaps, in the context of data analysis, “scour” has narrowed in its meaning, abandoning a semantic space that it has in general purpose usage, thereby creating an opening for scrub to acquire a preposition and new, specialized meaning.

  5. Bill Walderman said,

    May 7, 2011 @ 9:31 am

    As a lawyer, I hear, and use myself, the expression "scrub down" in the sense of "examine carefully," "go over carefully" or "research thoroughly" all the time, to the point where I no longer have any sense there's anything remarkable about the idiom–if I ever did.

  6. Aaron Binns said,

    May 7, 2011 @ 10:35 am

    In software development, the "bug scrub" is the triage of all outstanding bugs/defect reports.

  7. Charles in Vancouver said,

    May 7, 2011 @ 11:52 am

    I agree with the poster who mentioned video scrubbing. In addition, for anyone who has an iPhone or iPod Touch, it uses the term "scrubbing" when you move back and forth in a song or podcast.

  8. Jan Freeman said,

    May 7, 2011 @ 11:54 am

    I learned this surprising new "scrub" as an editor at the Globe in 1994, when I moved to a daily news department. It was generally applied to stories with a lot of controversial allegations or details — the ones that had to be watertight before publication, and might even need to go to the lawyers for further scrubbing. Could be that the editors picked up the usage from the lawyers, I suppose.

  9. John Lawler said,

    May 7, 2011 @ 12:04 pm

    The sound, of course, contributes.
    Scour is part of the two-dimensional SK- assonance class, and has semantics shared with the more complex SKR- (which seems to have scratch as its prototype). And, of course, the /r/ at the end of scour echoes initial skr- and helps conflate the two. This is undoubtedly not the first time this kind of thing has happened, judging from the lexical evidence.

  10. Brian said,

    May 7, 2011 @ 12:16 pm

    I've always been confused when I hear co-workers use "scrubbing" to mean close examination of data (but no suggestion of cleaning or redacting). At the time I took it to be a bad analogy with "scraping" (i.e. to have a program extract data from web pages that were never intended to be read by non-humans). But your analogy with "scour" feels much closer and makes me less upset with my fellow human beings.

  11. GeorgeW said,

    May 7, 2011 @ 2:04 pm

    I don't find 'scrub' particularly notable. My image is one of cleaning the material thoroughly to remove the 'dirty' (superfluous) data and get down to the 'clean' (useful) information.

    As suggested, I can see a possible conflation with 'scour.' There could be influence of 'scan' as well.

    [(myl) You've missed the point — in the cited passages, it's the wanted information that's playing the role of the "dirt".]

  12. GeorgeW said,

    May 7, 2011 @ 4:01 pm

    myl: "it's the wanted information that's playing the role of the "dirt".

    Hmm, maybe. In any event it is a process of carefully separating the useful from the uninteresting.

  13. Adrian said,

    May 7, 2011 @ 7:12 pm

    Do the people who use scrub in this way also use it in other contexts with the (somewhat contradictory) "wipe clean" meaning?

  14. Erik said,

    May 7, 2011 @ 7:34 pm

    I know that where I worked, HR and other people who did freedom of information requests often talked about "scrubbing" files prior to release (by removing names and other identifying information, for example.) I'm not sure whether that falls in the "wipe clean," "make usable" or "examine carefully" definitions, although it could be any of them, I guess. That makes the article referring to information analysts "scrubbing the files" especially ambiguous to me, as it could be anything from censoring the data to getting it into shape!

  15. Michael C. Dunn said,

    May 7, 2011 @ 7:58 pm

    I've also heard it used to mean a thorough copy editing, where again "scour" would perhaps be more appropriate, although you are cleaning up the language.

  16. Vireya said,

    May 7, 2011 @ 9:45 pm

    Scrubbing a hard disk in my professional life has meant completely erasing it, then overwriting the disk several times to ensure nothing can ever be retrieved from it. So rather than the intelligence agents scrubbing the files, it sounds like something bin Laden might have wanted done to those disks before the Americans arrived.

  17. a George said,

    May 8, 2011 @ 2:11 am

    To me "scrubbing" immediately brought the connotation "the action you perform by means of a pan when putting river gravel and water into it scrubbing for gold". This connotation fits perfectly with being transferred to other raw material, such as data.

  18. Peter G. Howland said,

    May 8, 2011 @ 2:19 am

    Without an insider telling me what was meant by the term, if I were asked to scrub a prospectus I’d scour it to remove the hype and look for the hard data.

    As an aside, who else but the NYT hyphenates “hideout” as Mazzetti and Shane’s editor did in their 5/5/11 story? Hasn’t this been a one-word term for decades?

  19. The Ridger said,

    May 8, 2011 @ 10:35 am

    "Scrub" wouldn't be the only word with opposite meanings – "dust" or "trim" are examples. It could easily mean "scour to remove" and "scour to find".

  20. John Burgess said,

    May 8, 2011 @ 12:36 pm

    In my experience with the US State Dept., 'scrubbing' a text–whether a press release, a report, a draft of a speech, etc.–would involve both looking for the technical errors, but also check the underlying logic and assumptions. It could involve redactions of certain kinds of information, but also just making sure there wasn't some glaring error that the first-drafter let through.

  21. Erik Hetzner said,

    May 8, 2011 @ 5:36 pm

    In computing, hard disks can be "scrubbed" (completely clean, see above). But also the ZFS file system has a command to "scrub" the disk (check for checksum consistency). And others have proposed to use scrubbing to mean reading each byte to check that it works. (see Schwarz et al. "Disk Scrubbing Large Archival Storage Systems") A very overloaded term

  22. Rob Van Dam said,

    May 9, 2011 @ 8:24 pm

    I think this is another case of the move to digital causing a word to slightly migrate from its original, literal meaning to more and more figurative meanings. Each step taken is only a small one from the previous figurative meaning but if you haven't followed each of those steps you might find the end result quite surprising.

    I don't believe that the use of scrubbing in this context has any metaphorical meaning related to cleaning the report/data. Instead, I think the original meaning of audio scrubbing on real tape decks transferred analogously to digital audio. At that point, it lost its connection to the tape head literally scrubbing (mostly in the "scraping" sense) the tape but maintained the functional meaning of manually controlled movement back and forth through a recording (rather than simply consuming at the audio's natural speed and direction). From digital audio it easily migrated to digital video, especially with so many video interfaces maintaining a slider that updated in real time.

    At that point, the only core meaning left is the means of consumption of some media, so transferring to digital documents or even ironically back out to non-digital documents is a very easy step. That combined with the obvious phonetic parallel between scouring and scrubbing and I think you have a almost completely non-metaphorical transfer.

    Obviously this is all just anecdotal but I've heard the term scrubbing used at each point along the sequence I've described and never got the sense that any of those uses (even going back to true audio tapes) were maintaining any analogy to cleaning, but instead only to the implied movement.

  23. Troy S. said,

    May 10, 2011 @ 7:31 pm

    I would suggest this is military slang that is entering into general civilian use.
    I heard it all the time in the Navy and don't notice anything exceptional about it anymore, if I ever did.

  24. Chris Helzer said,

    May 11, 2011 @ 6:52 am

    We use this term a lot in my industry (telecom), and I completely agree with Marc Cenedella's analysis above. It has gone from a specific meaning of cleaning up & validating data used by technical people to basically a synonym for "review" used throughout the business.

RSS feed for comments on this post