Language Log

TREC: 1992-2025 and onwards

July 7, 2025 @ 7:01 pm · Filed by Mark Liberman under Language and science

The 11 tracks of TREC2025 are underway, collectively constituting the 2025 edition of the "Text Retrieval Conference" organized by the National Institute of Standards and Technology. See the call for details and links, and this site for a few words about its history going back to 1992.

Wikipedia has more historical information, although the article's section on "Current tracks" is from 2018, which is not exactly "current".

And the Wikipedia article also doesn't give a clear picture of what TREC accomplished in its early years. Here's what it says about TREC-1:

In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections. Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach.

There's a whole book of published reports from The First Text Retrieval Conference (TREC-1), and it's all free to read. But you may find its 518 pages a little daunting, so you could start with the 20 pages of Donna Harman's clear and compelling Introduction. Or maybe just this brief passage from that source:

There is a long history of experimentation in information retrieval. […]

In the 30 or so years of experimentation there have been two missing elements. First, although some research groups have used the same collections, there has been no concerted effort by groups to work with the same data, use the same evaluation techniques, and generally compare results across systems. The importance of this is not to show any system to be superior, but to allow comparison across a very wide variety of techniques, much wider than only one research group would tackle. Karen Sparck Jones in 1981 commented that:

Yet the most striking feature of the test history of the past two decades is its lack of consolidation . It is true that some very broad generalizations have been endorsed by successive tests: for example…but there has been a real failure at the detailed level to build one test on another. As a result there are no explanations for these generalizations, and hence no means of knowing whether improved systems could be designed (p. 245) .

This consolidation is more likely if groups can compare results across the same data, using the same evaluation method, and then meet to discuss openly how methods differ.

The second missing element, which has become critical in the last 10 years, is the lack of a realistically sized test collection . Evaluation using the small collections currently available may not reflect performance of systems in large full-text searching, and certainly does not demonstrate any proven abilities of these systems to operate in real-world information retrieval environments. This is a major barrier to the transfer of these laboratory systems into the commercial world. Additionally some techniques such as the use of phrases and the construction of automatic thesauri seem intuitively workable, but have repeatedly failed to show improvement in performance using the small collections. Larger collections might demonstrate the effectiveness of these procedures. The overall goal of the Text Retrieval Conference (TREC) was to address these two missing elements. It is hoped that by providing a very large test collection, and encouraging interaction with other groups in a friendly evaluation forum , a new thrust in information retrieval will occur. There is also an increased interest in this field within the DARPA community, and TREC is designed to be a showcase of the state-of-the- art in retrieval research. NIST's goal as co-sponsor of TREC is to encourage communication and technology transfer among academia, industry, and government.

The "very large text collection" that she references was assembled at LDC, and was published in 1993 as Harman & Liberman, TIPSTER. That dataset included 1,077,909 documents from seven sources: the AP Newswire, the Federal Register, U.S. Patents, Department of Energy reports, the Wall Street Journal, the San Jose Mercury News, and Ziff Davis magazine articles. [I believe that the Patents and the San Jose Mercury News documents may not have been used in the TREC-1 evaluation, though I'm not certain of this.]

Most previous R&D in digital document retrieval and information extraction had worked with hundreds or thousands of documents, generally all of one kind. In the preparations for TREC-1, Donna Harman explained to me that she wanted to show that such retrieval and extraction problems could be solved at a commercially-relevant scale, and that collaborative research would iteratively improve performance. She set a target of million documents of half a dozen different types — which was not an easy ask at that time, seven to eight years before Google was founded, when the World Wide Web was not very wide or very deep.

I won't bore you today with the painful details of how we managed it. There were a few things already lying around — see "Thanks, Bill Dunn!" (8/6/2009) for one set of memories — but it was a scramble to find archives of documents, mostly in the form of truckloads of old-school 9-track tapes, to decrypt and standardize their mutually-incompatible and sometimes nearly-impenetrable formats, to get legal distribution rights, and to send the results around to the conference participants.

The spectacular success of the TREC conferences is worth emphasizing, given the damage recently done to government funding of (American) research and development in pre-commercial areas. There's been a fair amount of documentation and coverage of this issue — Phil Rubin, formerly the Principal Assistant Director for Science at the Office of Science and Technology Policy (OSTP) in the Executive Office of the President of the United States, has assembled what he calls a "running diary of ignominy".

TREC is of course an acronym for "Text REtrieval Conference", but it's also a pun on the work trek, which the OED glosses as

South African. In travelling by ox-wagon: a stage of a journey between one stopping-place and the next; hence, a journey or expedition made in this way; (also) journeying or travel by ox-wagon.

Now in general use elsewhere: a long journey or expedition, esp. one overland involving considerable physical effort.

…with the etymology

< Cape Dutch trek = Dutch trek draw, pull, tug, march, < trekken, trek v.

July 7, 2025 @ 7:01 pm · Filed by Mark Liberman under Language and science

Permalink

1 Comment

Avi Rappoport said,

July 8, 2025 @ 4:17 pm

TREC was key to the development of search engines before the Web, and influenced many early Web search engines. It was such a relief to have a reliable indicator of quality, when many approaches were elegant or intriguing but ineffective. At first, the entire thing was about retrieval and only used the concept of relevance to identify whether the search engine found the right articles. Later on, the value of relevance ranking became more important. I used the TREC dataset to test several web-style search engines in the 2000s, and it was great.

RSS feed for comments on this post

TREC: 1992-2025 and onwards

1 Comment

Avi Rappoport said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta