Sampling Enters the Mainstream Lexicon of E-Discovery Practitioners

August 27, 2012 Sheila Mackay

In the latest case leveraging the use of technology-assisted review, also sometimes called “predictive coding,” District Judge Rebecca Doherty issued instructions to the parties regarding its use, stating that the parties “agree to meet and confer regarding the use of advanced analytics…as a technique to ensure appropriate responses to discovery requests” and “as a document identification mechanism for the review and production of…data” In re Actos (Pioglitazone) Products Liability Litigation, No. 6:11-md-2299 (W.D. La. July 27, 2012).

Judge Doherty’s Case Management Order stated how the parties intend to conduct sampling, going so far as to specify that they follow these guidelines during the process:

  • Five hundred documents in the control set that will yield a desired confidence level of 95% but “for lower levels of richness, the error margin will be lower” (to be used as a basis for calculating precision and recall, which are then used for monitoring training progress and calculating performance);
  • During the training phase, a random sample of 40 attorney-assessed documents, with new 40-document sets continually reviewed until the “stable training status” is reached; and, 
  • A quality control “by random sample of irrelevant documents” (those with relevance scores below the cut-off score) set to verify that these documents contain “a low prevalence of relevant documents and that the proportionality assumptions underlying the cut-off decision are valid.”

Are times a’changing? Are attorneys really now required to be experts in sampling methodology? Is the court shifting the sentiment expressed in Magistrate Judge Facciola’s hallmark U.S. v. O’Keefe opinion just a few years ago that described attorney forays into highly technical methodologies as territory “where angels fear to tread”?

Surely, sampling has entered the mainstream lexicon of e-discovery practitioners and there’s no greater illustration of that fact than Judge Doherty’s Case Management Order and other recent technology-assisted review cases (Da Silva Moore v. Publicis Groupe and Global Aerospace Inc. v. Landow Aviation LP).

But by any deeper analysis, no. Attorneys should not be expected to excel in sampling and statistically sound measurement. As Conduent senior search consultant Amanda Jones co-wrote with Ben Kerschberg of earlier this year:

If 2011 was the year of technology-assisted document review, 2012 will be the year of re-humanizing technology-assisted review…Going forward, the focus will be not only on the foundational role humans play in guiding document assessment, but also on the role human expertise can play during the earliest stages of case strategy development and later during optimization of the review process. During these phases, experts from various fields may serve as a vital extension of the legal team, providing critical perspectives that legal subject matter experts alone may not possess…. Relying upon technology and legal and subject matter knowledge alone—without the support of any additional expertise—will rarely allow attorneys to achieve the best possible results, and it may weaken the overall defensibility of the approach. Given that most technology-assisted review is founded on statistical algorithms and linguistic pattern detection, empowering these systems with the expertise of…statisticians results in much greater flexibility and often higher quality and more readily defensible results in less time…With each team member playing to his or her talents and training, the review effort realizes greater efficiency, higher quality results, and reduced production time and costs. 

However, attorneys should be expected to understand the basics of sampling and know what questions to ask of their systems and/or processes (even if relying on experts in the meet-and-confer and other steps in the process to do a lot of the heavy lifting). These include the following:

  • Should seeding take place, and if so, under what circumstances? (In this Case Management Order, the parties decided that there would be no seeding.)
  • Should random sampling always be used?
  • How should the sample size vary with the size of the document collection?
  • What is a reasonable margin of error?
  • What can sampling “predict”?
  • How does sampling methodology affect performance quality?
  • How can strategic keyword search improve performance? (This is a topic we have covered in earlier posts.)

As the Case Management Order suggests, sampling is a powerful and flexible tool that can improve the efficiency, accuracy and cost of e-discovery. Parties that make the investment of effort required to understand the principles and parameters that apply to perform sampling will realize great dividends (read: real return on investment) as they proceed through discovery.

We urge attorneys to understand the key concepts of sampling before forging ahead with technology-assisted review projects. By understanding the fundamentals around sampling best practices in technology-assisted review – and leveraging those experienced in statistics to conduct the nitty-gritty, day-to-day work required to maximize its benefits – attorneys will be prepared when representing performance of their technology-assisted review systems to opposing parties and the court.

We will be answering some of the questions posed above and more in future posts. Stay tuned.

Sheila Mackay is Senior Director, E-Discovery Consulting at Conduent. She regularly advises in-house legal teams and outside counsel on best practices for utilizing CategoriX, Conduent’s technology-assisted review offering. She can be reached at

About the Author


Previous Article
ILTA 2012: Think Outside of the Box

Innovation. If there was one buzzword at ILTA’s annual conference last week, that was it. The keynote on da...

Next Article
ILTA 2012: E-Discovery Panels, Sessions and Demos – Oh My

Next week, thousands of litigation support and other legal and technology practitioners will convene at ILT...