Language Log

Intellectual automation

March 7, 2011 @ 7:01 am · Filed by Mark Liberman under Changing times, Computational linguistics

Following up on the recent discussion of legal automation, I note that Paul Krugman has added a blog post ("Falling Demand for Brains?", 3/5/2011) and an Op-Ed column ("Degrees and Dollars", 3/6/2011), pushing an idea that he first suggested in a 1996 NYT Magazine piece ("White Collars Turn Blue", 9/29/1996), where he wrote as if from the perspective of 2096:

When something becomes abundant, it also becomes cheap. A world awash in information is one in which information has very little market value. In general, when the economy becomes extremely good at doing something, that activity becomes less, rather than more, important. Late-20th-century America was supremely efficient at growing food; that was why it had hardly any farmers. Late-21st-century America is supremely efficient at processing routine information; that is why traditional white-collar workers have virtually disappeared.

Prof. Krugman is by no means the only one to have suggested that computer technology will eventually replace many white-collar jobs — worry (or jubilation) about the automation of intellectual labor has been widespread for at least a half a century. But so far, despite enormous changes in how these jobs are defined and carried out, the expected massive reductions in numbers have not generally occurred. For an early review and discussion, see Erik Brynjolfsson, "The Productivity Paradox of Information Technology", CACM 1993, for which Google Scholar lists 1,581 citations:

Delivered computing-power in the US economy has increased by more than two orders of magnitude since 1970 (figure 1) yet productivity, especially in the service sector, seems to have stagnated (figure 2). Given the enormous promise of IT to usher in "the biggest technological revolution men have known" (Snow, 1966), disillusionment and even frustration with the technology is increasingly evident in statements like "No, computers do not boost productivity, at least not most of the time" (Economist, 1990).

Perhaps machine learning, AI, and computational linguistics will finally bring about the long-predicted migration from white-collar cubicle farms to the urban reservoirs of unemployment. But history suggests that it would be prudent to wait and see. Farming delivers a relatively well-defined and consistent product: a bushel of wheat is a bushel of wheat, however radically the process of creating and delivering it changes. The services that (for example) lawyers deliver are more abstract, and technological innovation (whether in the form of typewriters, copy machines, word processors, document retrieval, or document-understanding systems) is likely to lead to changes in the product as well as changes in the way the product is created and delivered.

In considering these questions, we might look back even further, to Norbert Wiener's 1950 book The Human Use of Human Beings:

It is the thesis of this book that society can only be understood through a study of the messages and the communication facilities which belong to it; and that in the future development of these messages and communication facilities, messages between man and machines, between machines and man, and between machine and machine, are destined to play an ever-increasing part.

March 7, 2011 @ 7:01 am · Filed by Mark Liberman under Changing times, Computational linguistics

Permalink

18 Comments

Josh Bowles said,

March 7, 2011 @ 8:39 am

"…history suggests that it would be prudent to wait and see. Farming delivers a relatively well-defined and consistent product: a bushel of wheat is a bushel of wheat, however radically the process of creating and delivering it changes. The services that (for example) lawyers deliver are more abstract, and technological innovation (whether in the form of typewriters, copy machines, word processors, document retrieval, or document-understanding systems) is likely to lead to changes in the product as well as changes in the way the product is created and delivered."

I agree:
Capturing and processing large amounts of natural language text is no problem now. But putting such collections of data to use is another beast entirely. Many times the types of data storage one uses will eventually determine some boundary to what you can or cannot do with the data… the larger the data store the less agile you typically become in your ability to move the data (and move around within that data). There are numerous types of data storage (types of databases) and many of these have not been explored on a grand or broad scale.
But say you've figured this out, and you can move around data in an agile way (the way a human can accumulate lots of facts, and one day suddenly "see" those facts in an entirely new way—suggesting some kind of change of fundamental architecture in the way those facts are related: such as an 'aha!' moment with a poem or legal document), we still have to access that data within the limits of human cognition.
So we can mine terabytes of court logs searching for precise relations, how exactly do we filter a trillion relational data points so that it makes sense to us? The usual graphic representations of data point and metrics probably aren't gonna cut it—histograms, pie charts, etc., encode an interpretation of the data and new visualization techniques will do the same….

Collecting and analyzing data given precise definitions may be automated, but interpreting this data is another problem, and filtering the data for consumption is yet another challenge. These latter, given a phenomenal success of the previous, could keep scores of "analysts" busy for a long time.

[(myl) Indeed. The first concordance (of the Latin Vulgate) took Hugo de Saint Cher and dozens of 13th-century Dominican monks years if not decades of painstaking labor to create; the same thing can now be done in a few seconds of computer time, with a program created as an easy homework assignment early in the first semester of a programming course. This development has increased rather than decreased the number of jobs for computational linguists.]
Ginger Yellow said,

March 7, 2011 @ 9:39 am

The Brynjolfsson paper/quote suffers from unfortunate timing. BLS statistics show productivity growth doubling from an average 1.4% pa in the decade leading up to 1990 to 2.8% pa in the last three years. Admittedlyt that only takes it back to the levels of the post-war boom era, but there are plenty of other reasons we can think of why productivity would have soared then.

[(myl) But the real point of his paper is that we don't/didn't really know how to measure productivity in the area of office work, since the nature of the work product is constantly evolving. While this is far from my area of expertise, this seems to me like a valid point that makes white-collar productivity measures especially difficult to compare across time. What people in offices produced in 1850, 1900, 1950, 2000, …, was "the same" only in some very abstract sense, whereas what people on farms produced was much easier to compare across time.

Something that's easier to quantify reliably, it seems to me, is the fraction of the population who do a given general sort of work. The proportion of the U.S. workforce engaged in agriculture changed from 41% in 1900 to 1.9% in 2000. If something like that happened to office work, we'd certainly notice it. How people in offices spend their working time has changed at least as radically since 1900 as how people on farms spend theirs — but the results in terms of labor-force size have so far not been anything like the same… ]
Ginger Yellow said,

March 7, 2011 @ 11:27 am

If something like that happened to office work, we'd certainly notice it. How people in offices spend their working time has changed at least as radically since 1900 as how people on farms spend theirs — but the results in terms of labor-force size have so far not been anything like the same… ]

True, but for a given category of "office work" the results may be the same. Secretarial work springs to mind, especially if you control for the growth in "work that would have required secretarial assistance in 1900".
John Baker said,

March 7, 2011 @ 12:05 pm

I don't think this post does justice to Krugman's argument, as expressed in today's column: Computers excel at tasks that can be accomplished by following explicit rules, so, any routine task, including many white-collar, nonmanual jobs, is in the firing line. Perhaps in part as a result, both high-wage and low-wage employment have grown rapidly, but medium-wage jobs — the kinds of jobs we count on to support a strong middle class — have lagged behind.

We've already seen that the need for clerks and secretaries has dropped substantially. To some extent, this will certainly continue: Why pay a paralegal to summarize a deposition, for example, if a computer can do it just as well and much more cheaply? However, the idea that computerized legal research can really replace lawyers, now or in the next few years, strikes me as hyperbole. Organizing legal information and simplifying legal research has received sustained effort for well over a century; I don't think modern computers are suddenly about to make this a routine process.
Dan Lufkin said,

March 7, 2011 @ 12:24 pm

There's a whole branch of the Dismal Science devoted to the economic impacts of improvements in efficiency. In the 1860, British economists were worried that improved efficiency in the use of coal would lead to decreased demand and loss of mining jobs. Instead, coal demand boomed. This was explained by W.S. Jevons (q.G.) and is the Jevons Paradox. It shows up particularly in the demand for energy so that improvements in energy efficiency do not automatically lead to a decrease in demand. It all depends on the shape of the price/demand curve. In energy economics, this is covered by the Khazoom-Brooks Postulate (nominative destiny?).

What we're seeing is that efficient use of language corpora leads to greater demand for corpus-miners.

There's also the aspect of emergent properties. Get enough of something all in one place and it can take on properties that small quantities do not exhibit. Think of going from 10 termites to 10 million. Huge corpora can do things we haven't begun to understand.

How to display all this information so that it can be understood? Prof. Tufte, call your office.
Josh said,

March 7, 2011 @ 12:24 pm

Ginger makes a good point. Secretarial work force probably does show a similar change. A great deal of what secretaries used to do has been replaced by Microsoft and Xerox. At a typical company these days, only executive level management are likely to have personal secretaries. Everything else is handled by a couple admins for an entire office. (Think Mad Men vs. The Office)

A significant portion of my job is automating white collar work. In the time I've been at my job, the group has only gotten larger–from 6 to 15 while I've been here. In engineering, you generally automate the time consuming and repetitive tasks. These are things that humans a very bad at. They're slow, and prone to making errors when repeating the same task over and over. Computers are built for just that. This frees up the engineer for what they are best at: creativity and problem solving. Let the designer focus on the design and let code do all of the tedious dirty work. You're then able to produce much more product for less cost, which hugely boosts profitability which then lets you hire MORE engineers to grow even faster. I've replaced the entire job I was hired for with a handful of programs. That was 40+hours a week of work just for me, that now only takes a couple hours a month of my time in maintenance. And instead of taking a few days to release something, it now takes a few minutes.

[(myl) As I observed in response to an earlier comment, what people do in offices has evolved dramatically, and continues to do so — at least as much as what people do on farms. But it remains true that the proportion of people employed in agriculture has decline by more than an order of magnitude over the past century, while the proportion of people employed in offices has (I think) increased. This might change at some point — but just pointing to current digital automation doesn't show us much.]
Barbara Phillips Long said,

March 7, 2011 @ 12:36 pm

My understanding of Krugman's point about the cost of education and the hollowing out of the workforce is not one of a stable workforce number with a different mix of skills, as the need for more computational linguists would suggest.

A computational linguist requires training. As I understand the trends Krugman discusses, wages for computational linguists should decrease as they become a less exotic specialty and a larger part of the job market. But because salaries for computational linguists would drop, the cost of education to become a computational linguist would lead to a decline in the supply of such workers in the United States.

The jobs would be off-shored, Krugman implies, as other middle class jobs have been eliminated or moved offshore. That's not bad for the globe, but it is bad for the U.S.

The issue is not only one of changes in the types of jobs being done or the content of those jobs or worker productivity, it is the cost of specialized education. Krugman doesn't say so, but more subsidies for training/education could be provided if the beneficiaries of such training — corporations — paid more taxes. The amount of corporate tax payments compared to GDP has fallen significantly, one study says:

http://www.offthechartsblog.org/what-should-corporate-tax-reform-look-like/

Productivity measures have always seemed to me to be dubious when more intangibles have to be measured. Some industries apparently aren't tracked, including many service industries with a big impact on our economy. I can't find a clear list of what "business" encompasses as BLS, but a report from the Dallas Fed from several years ago says this:

"The productivity index reflects the performance of the economy as a whole. The BLS also calculates annual productivity numbers for about half the industries in the business sector. For example, manufacturing, mining, wholesale and retail trade, and transportation have detailed productivity statistics. Due to lack of reliable information, the reports do not cover some of the more interesting industries such as health care, legal, financial, insurance and real estate services."

From:http://www.dallasfed.org/eyi/free/0406product.html

One would think with the emphasis on billable hours in the legal profession, some kind of productivity measure could be developed. In other professions that rely on salaried workers, I doubt the accuracy of productivity measurements. BLS says it measures output per hour of work, not output per hour paid, but in my experience there are a lot of employers with salaried workers that don't track their actual hours.

Based on my personal experience at work, new software providers are moving us toward a more "paperless" office as software is being used for expense reimbursement, performance appraisals, some employee training, and various human resources data management. Office workers have been laid off — I don't know if there has been a concurrent and equivalent increase in the number of people writing software or maintaining servers.
J. W. Brewer said,

March 7, 2011 @ 3:09 pm

To give a real-world example, this process http://en.wikipedia.org/wiki/Document_comparison was still being done by hand when I started practicing law in NYC in 1993. We had a small cadre of very detail-oriented support staff who worked the graveyard shift who could and would on request create what were variously called redlines or blacklines by physical markup of the new version of the document indicating additions and deletions as compared to the prior version. (To the extent there was software that did this on the market at that point, it seemed to produce unreliable results often enough that many lawyers were not comfortable relying on its output.) We no longer employ support staff to do that. We likewise no longer have support staff in the library who will take a list of case citations and get you copies by physically pulling the relevant volumes off the shelf and putting them page by page on the glass of a xerox machine. Rather, we have people who will (much more quickly!) obtain electronic copies of the cited cases and send them to you as email attachments.
Xmun said,

March 7, 2011 @ 3:14 pm

@Dan Dufkin: "This was explained by W.S. Jevons (q.G.) and is the Jevons Paradox."

"q.G.": what a pleasure! I can't be bothered looking up Jevons, but I'll certainly remember the abbreviation.
Dave Lewis said,

March 7, 2011 @ 3:16 pm

One reason not to expect the same sort of productivity improvements in knowledge work as in, say, farming, is that a lot of it is a competitive endeavor. The single page purple mimeographed advertising flier must now be a 32 page high resolution color brochure with eye popping graphics (occasionally based on actual data) and accompanied by a DVD and (if you're unlucky) 3-D glasses. The same competitive edge that would have required a handful of bits of information (about, say, demand in a particular country) in 1911 requires terabytes of information about individual customers in 2011. I suppose in some sense this means everyone has become more productive, but the connection to traditional notions of productivity is pretty loose.

[(myl) Exactly. This is an updated version of Brynjolfsson's main point, as I understand his paper anyhow.]
language hat said,

March 7, 2011 @ 5:49 pm

"q.G.": what a pleasure! I can't be bothered looking up Jevons, but I'll certainly remember the abbreviation.

I know I'm going to feel like an idiot, but what does it stand for?
kitty said,

March 7, 2011 @ 6:47 pm

hat: q.G. may relate to this blog post. It should be q.g., verbs aren't capitalized.
komfo,amonan said,

March 7, 2011 @ 6:48 pm

"q.G.": what a pleasure! I can't be bothered looking up Jevons, but I'll certainly remember the abbreviation.

I know I'm going to feel like an idiot, but what does it stand for?

Without looking it up, I'm going to assume & hope it means "quod Google".
Zora said,

March 7, 2011 @ 7:40 pm

q.G.?

Quake God?
quality grade?
Quartermaster-General?
quasigeostrophic?
Queen's Gambit?
Quinidine Gluconate?
Xmun said,

March 7, 2011 @ 11:00 pm

"quod Google": that's what I thought too.
D.O. said,

March 8, 2011 @ 2:07 am

Maybe it's just a restatement of what Dave Lewis said or even what Prof. Liberman meant by "a bushel of wheat is a bushel of wheat", but I wanted to chime in to say that the demand for (for example) legal services is not subject to the limitations that the farming might be. And it is not only the way we define what good legal services might be, but the sheer bulk. I hasn't sued anybody in my life and was never sued myself, but it might be seen as an abnormality of our days just as experience of somebody from the centuries back who never traveled more than 10 miles from his place of birth.
Dan Lufkin said,

March 8, 2011 @ 10:03 am

Yes, friends, some time ago I realized that English lacked a word and brought quod vide out of its academic hidey-hole, refurbished it and launched onto fertile (one hopes) ground. I used it here as early as May 3. 2009, but no one noticed. Its time of greatness had not yet arrived, I guess.

Please feel free to use it as you see fit. I claim no rights of authorship.
John Cowan said,

March 13, 2011 @ 2:10 am

I think it should be q.v.G for quod vide Google, the last word being of course the ablative of means.

RSS feed for comments on this post

Intellectual automation

18 Comments

Josh Bowles said,

Ginger Yellow said,

Ginger Yellow said,

John Baker said,

Dan Lufkin said,

Josh said,

Barbara Phillips Long said,

J. W. Brewer said,

Xmun said,

Dave Lewis said,

language hat said,

kitty said,

komfo,amonan said,

Zora said,

Xmun said,

D.O. said,

Dan Lufkin said,

John Cowan said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta