Language Log

Google lawsuits settled

October 28, 2008 @ 1:05 pm · Filed by Ben Zimmer under Research tools

Rumors had been percolating for a while now, and today it was finally announced: Google has reached a settlement with U.S. authors and publishers who had filed lawsuits challenging the massive digitization project of Google Book Search. According to Google's press release, the settlement resolves lawsuits from the Authors Guild and five major publishers (McGraw-Hill, Pearson Education, Penguin, Wiley, and Simon & Schuster). Google will shell out $125 million, much of which will be used to establish the Book Rights Registry, a system for locating and representing copyright holders (a way of dealing with so-called "orphan works").

One key aspect of the settlement, discussed on the Google Book Search blog, is that GBS will now be able to display millions of books that are in copyright but out of print:

With this agreement, in-copyright, out-of-print books will now be available for readers in the U.S. to search, preview and buy online — something that was simply unavailable to date. Most of these books are difficult, if not impossible, to find. They are not sold through bookstores or held on most library shelves, yet they make up the vast majority of books in existence. Today, Google only shows snippets of text from the books where we don't have copyright holder permission. This agreement enables people to preview up to 20% of the book.

That's extremely welcome news for researchers who have been frustrated by the limitations of Google's "snippet view." Now (at least in the U.S.) we'll get far more books in "limited preview," where you can actually see full page images. Even if that viewing option is restricted to 20 percent of a book's page extent, it's still a vast improvement over the godforsaken snippets. Even better, the settlement provides for "free, full-text, online viewing of millions of out-of-print books at designated computers in U.S. public and university libraries," so if you need a book in full view you can just head over to a library — or at least one that carries Google's new institutional subscription. Combined with the establishment of the Hathi Trust for access to public-domain materials (before the 1923 copyright cut-off for U.S. titles), things are definitely looking up for the future of digitized research.

[Read more about the settlement agreement here and here. Neal Goldfarb notes below that the agreement may provide some new opportunities for corpus analysis and other linguistic research.]

October 28, 2008 @ 1:05 pm · Filed by Ben Zimmer under Research tools

Permalink

10 Comments

Chris Waigl said,

October 28, 2008 @ 2:33 pm

Ben – However welcome this is, I am sad to see that, once again, the content will only be available for people accessing the site from the US.
Joe said,

October 28, 2008 @ 2:59 pm

I think that publishers are cutting off their nose to spit their face. Had they forgone all these crazy fees and allowed Google to set up some on-demand publishing or the like (with royalties going back to them), they could be making a lot more money.

But no. Market control is more valuable to them than allowing people to access information. And people wonder why some have precious little respect for copyrights (even if they have plenty of respect for authors) …
Kathy J said,

October 28, 2008 @ 3:21 pm

"…in-copyright, out-of-print books will now be available for readers in the U.S. to search, preview and buy online… "

Who will they be able to buy online from? Google? So maybe they will have on-demand publishing…
Rich B. said,

October 28, 2008 @ 4:16 pm

Can someone explain how/ if this applies to periodicals?

Is a 1936 National Geographic "out of print"? How about a Batman comic book from 1950? Both are still publishing something today, but I can't get that 50+ year old story. Or what about Life Magazine, which is no longer being published?
Giles said,

October 28, 2008 @ 4:47 pm

Kathy, I suspect that "buy" in this context will mean "rent a DRM-encumbered ebook", not "purchase a physical object". Still, that will be better than nothing for those who can't easily travel to a library.

The US-only nature of it all is very depressing, though. Once again, the majority of the world's people — including those in developing nations for whom the internet is the only real hope of ever having access to a large body of knowledge, given the general scarcity of libraries and poor postal/transport infrastructure — are being deprived of an opportunity that is being given away for free to Americans. How is this just? How is it even logical? What possible argument is there to support this, given that it is not even protecting any revenues?!
Neal Goldfarb said,

October 28, 2008 @ 6:48 pm

@Giles: The US-only nature of it all is very depressing, though. Once again, the majority of the world's people — including those in developing nations for whom the internet is the only real hope of ever having access to a large body of knowledge, given the general scarcity of libraries and poor postal/transport infrastructure — are being deprived of an opportunity that is being given away for free to Americans. How is this just? How is it even logical? What possible argument is there to support this, given that it is not even protecting any revenues?!

Presumably the lawsuit was limited to claims under US copyright law, so it's not surprising that the settlement is limited to the US. I'm sure that for many works, the plaintiffs don't even own the foreign rights.
Neal Goldfarb said,

October 28, 2008 @ 7:20 pm

I just took a quick look at the settlement agreement (available here). One feature that will undoubtedly be of interest to linguists is that the agreement will provide for access to almost the entire Google Books database for corpus analysis and related purposes:

(b) Textual Analysis and Information Extraction – Automated techniques designed to extract information to understand or develop Relationships among or within Books or, more generally, in the body of literature contained within the Research Corpus. This category includes tasks such as concordance development, collocation extraction, citation extraction, automated classification, entity extraction, and natural language processing. (c ) Linguistic Analysis – Research that performs linguistic analysis over the Research Corpus to understand language, linguistic use, semantics and syntax as they evolve over time and across different genres or other classifications of Books. (d) Automated Translation – Research on techniques for translating works from one language to another. (e) Indexing and Search – Research on different techniques for indexing and search of textual content.

It's not immediately clear to me if this includes commercial uses (e.g., by dictionary-makers.)
Forrest said,

October 28, 2008 @ 7:20 pm

This is … a little bizarre. I understand the need for copyright laws; I produce photography, and very much appreciate the legal right to decide the fate of my work. But as far as I can tell, this goes far and away beyond what most people would call reasonable. A guild settling for $125 million to represent people who (presumably?) aren't members of that guild, in order to protect copyright holders who seem to have walked away from their work … is just odd.
Stephen Jones said,

October 29, 2008 @ 2:10 pm

Can somebody explain to me why I have to suffer snippet view on a scan of a book in the New York Public Library system, published in 1878, and written by an author who died in 1918 (William Digby to be precise).
Jorge said,

November 1, 2008 @ 2:58 am

trackback: "[…]Bejamin Zimmer, en Lenguage Log. Sobre otra de la ventajas: la posibilidad de acceder ahora a los libros descatalogados pero bajo derecho de explotación[…]"

RSS feed for comments on this post

Google lawsuits settled

10 Comments

Chris Waigl said,

Joe said,

Kathy J said,

Rich B. said,

Giles said,

Neal Goldfarb said,

Neal Goldfarb said,

Forrest said,

Stephen Jones said,

Jorge said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta