It’s right in the name: technology-assisted review. Yet some seem to be under the impression (or, as salespeople, would like to create the impression) that TAR is a push-button solution that can magically whisk away the need for human expertise – save a handful of experienced attorneys most knowledgeable about the matter – altogether. Raw documents in, reviewed documents out. Ta-da!? Nuh uh.
Machines are good at doing certain e-discovery tasks that humans struggle with, such as processing huge amounts of data and applying coding consistently and comprehensively. But a machine can’t decide on its own what is relevant to a particular matter and what is not, which is why human training is a crucial component of any defensible TAR application. Only then can a TAR model go out among the larger universe of documents to generate useful results. Human reviewers are also necessary to assess the outputs of TAR and to make informed decisions about what to do with documents once they are prioritized into buckets of most- to least-likely to be responsive.
The role of knowledgeable human reviewers and case team members is one thing. But what about other e-discovery experts who play crucial roles in assuring that TAR will result in useful and defensible outputs? Statisticians, linguists, and technologists play the starring roles in this chapter of the TAR story.
Your statistician will use sampling techniques to help determine the rate of responsiveness in your review population. In other words, how many responsive documents can you reasonably expect to exist in the population as a whole? 1%? 15%? This is a key piece of the defensibility puzzle, because it allows you to know when your review is effectively done, and gives you a reference point for decisions about subsets of documents that are either impoverished of or especially rich in relevant material. For example, if you know your overall responsiveness rate is 20% and you have a large subset of TAR-deprioritized documents that can be shown via sampling to have a responsiveness rate of just 1%, you may be able to justify the review of only the documents therein that are tagged by certain keywords, or even defensibly exclude that subset of documents from review entirely. Similarly, if you have a subset of TAR-prioritized documents that can be shown to have a responsiveness rate of 80% or more, you may be able to defensibly hand over that subset wholesale after privilege review, without expending further resources on content review. Such decisions can result in great cost and time savings, but are obviously not the types of decisions that should be made based on hunches or impressionistic evaluations of the data.
Your statistician may also work behind the scenes during the TAR training phase, iteratively tuning and testing the developing TAR model, experimenting with data mining tactics beyond text modeling, and identifying documents for QC purposes to ensure that the coding applied to the training documents is as accurate as possible. A statistician can also increase review efficiency by calculating the minimum number of documents that need to be coded in order to sufficiently train the TAR system. And your statistician is the one who can generate solid, defensible metrics about precision and recall—metrics that are increasingly recognized as the crucial indicators of the success of a review effort.
Linguists assist with the efficiency and accuracy of the TAR process by helping to evaluate the nature of the corpus as a whole (such as the presence and prevalence of foreign language documents) and advising accordingly—for example, a corpus rich in financial spreadsheets might benefit from folding in a separate TAR model focused on that kind of document exclusively. Documents that exceed a certain threshold of foreign language content may be most appropriately segregated for a more traditional review.
But perhaps most importantly, linguistic search consultants can significantly enhance the results of TAR by generating search terms (or fine-tuning existing terms) to be utilized in parallel with the TAR modeling process. There will always be at least some keywords or phrases that are strongly correlated with responsiveness; it would be foolish not to leverage such targeted search terms to supplement your results. For example, your highest-ranked tier of potentially relevant documents might be a combination of the top TAR-ranked documents plus any documents hit by a certain set of keywords, regardless of those documents’ TAR score. A discussion of why you want a linguist on your side when using search terms can be found here.
The Role of Technologists
Finally, an experienced technologist will manage the flow of documents, make sure that the review pool subjected to TAR comprises only documents appropriate for such evaluation (e.g., image and sound files should be excluded from text-based classifiers), and ensure that the right documents are seen by the right reviewers both during the training and regular review phases of the TAR process. He or she can also manage the documentation of the process itself, insuring transparency and auditability—additional keys to a defensible result.
A basic TAR system should generate a prioritization scheme for your review population that creates at least some benefits in terms of review efficiency. But a more robust TAR approach—using e-discovery experts who can tune your process to generate the best results for your specific case, while simultaneously ensuring defensibility—is usually the better bet, and pays off in terms of increased efficiency, reduced review costs, and the peace of mind that a review was completed with acceptable and demonstrable performance metrics.
About the Author
Karen Baumer is a Senior Search Consultant at Conduent. She can be reached at firstname.lastname@example.org.More Content by Karen Baumer