Skip to main content

A Case Study: Technology-Assisted Review Versus Search Terms

In a recent multibillion-dollar fraud case, a global financial services company had three months to identify, collect, process, review, and produce responsive documents from a collection of 30 million documents. Given the tight deadline and huge volume of information, the client had two options: it could complete the project internally and absorb the staggering discovery costs, or it could engage a service provider and leverage its advanced technology to more quickly find the subset of relevant documents.

After reviewing various options, the company chose CategoriX, Conduent’s patented technology-assisted review (TAR) technology. However, before the company could proceed with discovery, it had to fend off attacks on the technology from the opposing parties, who argued that the application of search terms—rather than TAR—was the appropriate method for culling the review population.

The judge, relying on U.S. Magistrate Judge Andrew Peck’s watershed decision in Da Silva Moore v. Publicis Groupe, established daily Rule 26(f) meet-and-confer sessions to address the use of TAR. The opposing parties retained an expert who attempted to attack the technology, but the evidence that Conduent and its client presented convinced the judge that CategoriX was the better approach.

Given the collection’s size, Conduent’s experts suggested a segmented approach and divided the collection into multiple subpopulations. Then, they drew a statistically significant seed set ranging from 9,000 to 12,000 documents from each subpopulation. In all, 42,000 documents—or 0.02 percent of the collection—were used to train and test the TAR algorithm.

After applying the TAR process, CategoriX reduced the total collection by 86 percent, dropping the number of documents that had to be manually reviewed from 30 million to 4.1 million. Conduent’s expert statisticians presented compelling evidence to the court that the smaller population contained at least 90 percent of the collection’s responsive documents. Moreover, Conduent found that had search terms been used to cull the collection, its client would have had to review 2.6 million more documents—a difference of 63 percent. A comparison of the TAR and search-term review pools showed that both populations contained similar numbers of responsive documents, meaning that the additional 2.6 million documents were largely nonresponsive.

The case study illustrates that CategoriX was superior to traditional search-based methodologies alone in winnowing a colossal document population to its responsive core. Moreover, it demonstrated that with sound TAR methodology and expert technical and statistical guidance, clients no longer must choose between review speed and accuracy.

To read the full case study, please click here.