Know Your Data

February 25, 2013 conduentblogs

In litigation, early knowledge has always been powerful. In February 2006, Elpido (“PD”) Villareal, then Vice President of Litigation and Conflicts Management at Schering-Plough, recognized the importance of gaining early insight into a matter. At that time, the company’s outside counsel was required to provide an early assessment of a case within 60 days:

“Our view is that if you had two years to litigate a case you might know more than after the 60 day process. However, in 60 days, for a lot less money, you will know 80 percent of what you will ever know about a case. Most of the time, that knowledge is enough to make rational and intelligent decisions about resolving a case.”

Today, outside counsel are expected to offer an assessment within a fraction of that time. But when litigation and investigations involve millions of documents, the sheer volume of information in play compromises even the most seasoned lawyer’s ability to make informed judgments about a case. Complicating the matter even further is the cost of review: lawyers cannot risk making erroneous assumptions about their data population that lead to review strategies and decisions that may turn out to be misinformed based on an early snapshot of the case vis-à-vis the data.

Given the quantity of data, most parties no longer focus on holistic case assessment: they now mine data for insights in the early data assessment process, called EDA – that is, splicing and dicing the data into critical and non-critical groupings, narrowing the number of key players and testing and validating key search terms. EDA allows case teams to analyze data populations at the onset of a case, including volume, custodians, file types, sources, pertinent dates, key terms and concepts and timelines.

EDA is an essential e-discovery preparedness tool providing insights into document collections that can help legal teams do the following: make more informed decisions about filtering strategies; determine case scope and timelines; develop or confirm case strategy earlier on in the case; estimate document review cost and time; and, determine the population of documents that may warrant full review.

There are a range of methodologies that can be leveraged to meet EDA goals before the review begins, used singly or in conjunction with one another. I have highlighted a few of the important ones below.

Corpus analysis tools provide a breakdown of the documents contained in a corpus, or document collection, by type, percentage in the collection, etc. Are video files going to be not relevant to the matter? What about other types of documents? Corpus analysis methods enable users to quickly visualize the make-up of a set of documents and make decisions on how to best approach the review.

E-mail analytics can identify patterns of communication among custodians. Were four people in the e-mail chain and then one person suddenly dropped off? While EDA can’t answer the “why,” it can flag this information for further investigation.

A clustering tool groups documents in a cluster that are more similar to each other than those in other clusters based on relationships in the text and concepts they contain. By providing a snapshot of the keywords associated with each cluster in the population, clients gain an understanding of the breadth and diversity of the topics represented therein. Then, the client may be able to identify rich sources of relevant material to prioritize for review and/or irrelevant segments of the population to deprioritize or eliminate from the review.

Similarly, concept search helps users identify potentially important key terms and phrases in a document collection based on their correlation to a starting term of set of terms.

Our old friend keyword search is also a critical EDA tool when applied judiciously. That is, the application of search involves sampling, iterative development, and testing and measurement techniques that draw on statistical and linguistic expertise, discussed in “Keywords, Done Right, Do Well.” And search analytics can provide deeper insight into search results by providing options for executing more targeted searches and specifying the type of return desired. For example, do you want variants of a root word? Misspelled versions of a word? Synonyms? Terms related to a given search term? Search analytics can provide this information.

EDA methods also provide even deeper insights into the data population that add value in preparing for the Rule 26(f) conference. It can, for example, help isolate pertinent dates, key custodians or concepts, and useful search terms. You can then test keywords ahead of the meet-and-confer: that way, if opposing counsel proposes keywords likely to be over-inclusive or under-inclusive or likely to yield false positives, you can present the preliminary EDA results, refining the search strategy and narrowing the scope of review.

The earlier you slice and dice your data to gather valuable information, the better prepared you will be to engage in strategic discussions about e-discovery with opposing counsel and for your overall review and case strategy.

Sheila Mackay is Senior Director with Conduent. She can be reached at

About the Author


Previous Article
A Peek Behind the Technology-Assisted Review Curtain

It’s right in the name: technology-assisted review.  Yet some seem to be under the impression (or, as sales...

Next Article
Do the Proposed FRCP Revisions Go Far Enough?

There is cautious optimism around the U.S. Judicial Conference’s Advisory Committee on considerations of re...