Culling and Keywords: Going Broad Without Going Astray

December 17, 2012 conduentblogs

Picture this: in a matter involving undisclosed side effects of a popular prescription drug called Zonovir, you’re negotiating a set of keywords to be applied to the pharmaceutical company’s documents. The result of that initial cull will be the set of documents produced after privilege and confidentiality review. You want to come up with a set of words and phrases most likely to be found in relevant documents, right?

Not exactly—though even the most experienced attorneys, their e-discovery and litigation support staff, and many others involved in conducting keyword searches would be tempted to quickly hone in on keywords that feel “especially relevant” to the matter.

Why is it suboptimal to focus on relevance to your specific matter too early on—when you have no direct insight into the document population and keywords are your only tool? At this stage in the game, a narrowed focus on specific subtopics of interest frequently backfires, bringing in large volumes of what are actually non-relevant documents. For example, while certain documents discussing certain drug side effects will surely be relevant to the case, generically targeting discussions of adverse effects within a pharmaceutical corpus will inevitably bring back far more than you want or need. We’ve all been there: more documents culled in = more money spent downstream, not to mention all the other challenges of having a high volume of non-relevant documents that someone has to sift through. At the culling stage it is more prudent to target the documents that are most likely to be on topic at a higher level. Such an approach will help you cull in the set of documents that is most likely to encompass the ones that are truly responsive, but without culling in too much chaff.

So if “discussions about side effects” isn’t a good way to conceptualize the topic at hand, what is? Think of it this way: people can’t talk about the adverse effects of Zonovir without talking about Zonovir. So, craft keywords that will be likely to find documents discussing that drug per se. That will naturally require keywords other than just “Zonovir,” though “Zonovir” is an obvious and likely effective and precise choice. But does Zonovir have a generic name, or a nickname like “Z-vir?” Are there any known code names? A unique internal identification number? Any unusual key ingredients? Any non-standard packaging? What condition is it commonly prescribed to alleviate, and does the company make any other drugs to treat that condition (if not, the condition itself could be a useful keyword)? Is there anything unique about its manufacturing process, like the use of an uncommon machine? These kinds of questions should be part of the negotiations. Explore the realm of words and ideas that are strongly correlated with the creation, composition, use, and discussion of Zonovir and include those in your keyword set. Meanwhile, resist the urge to include keywords like “adverse event,” “nausea,” “headache,” etc. Those are concepts that will do far more harm than good when applied without additional context to keep them constrained within the discussions you are actually seeking.

Is the marketing of Zonovir also at issue? Congratulations! People can’t discuss the marketing of Zonovir without talking about Zonovir, so you’ve already got that covered. Are there allegations that Zonovir’s packaging misleadingly suggests that it is a totally safe product? Congratulations! People can’t talk about Zonovir’s packaging without talking about Zonovir, so you’ve got that covered too. And so forth.

A keyword culling strategy should be broad by design, because that will provide a kind of safety net around the set of truly interesting documents. But there’s good broad and there’s bad broad. Broadness doesn’t help you if you’re sweeping in tens or hundreds of thousands of documents that mention topics you’re interested in (like certain side effects), but in contexts that are not relevant to the matter. And here’s another thing—this approach is in the best interest of both sides. The goal is to exchange the documents that are proper to exchange and that the parties are obligated to exchange, and to avoid involving documents that have nothing to do with the topic at hand and may contain information a corporation doesn’t need or want to reveal. The time and costs of production and review are minimized when the set of documents in play is as accurate as possible.

It’s never a bad idea to engage a search consultant, typically a linguist with e-discovery experience, when negotiating keywords. An objective linguistic expert will be able to spot potentially troublesome keywords before they have a chance to go rogue. A consultant can also help to judiciously expand the breadth of concepts targeted by your terms—again, to get at the on-topic documents most likely to encompass the appropriate documents, in a principled fashion—and provide clear, documented justification for any suggested revisions. Whether you engage a consultant to work with you periodically throughout the keyword negotiation process, or just for a few hours to do a sanity check on the proposed final keyword list, you will certainly improve your results in a way that will make the rest of the case preparation process more efficient and effective.

Karen Baumer is senior search consultant at Conduent. She can be reached at

About the Author


Previous Article
Culling and Keywords: Going Broad Without Going Astray

Picture this: in a matter involving undisclosed side effects of a popular prescription drug called Zonovir,...

Next Article
“Apps” and Big Data and Privacy – An Oxymoron?

Last week’s annual Georgetown Law Advanced E-Discovery Institute in McLean, Va. opened with a keynote addre...