Skip to main content

Reduce Those Terabytes

And while you’re at it, increase your legal team’s document review efficiency.

It goes without saying that reducing costs starts with the efficient and defensible reduction of document populations subject to review. It’s a simple equation: culling results in fewer documents, which means fewer review hours, which reduces costs.

In most legal matters, traditional filters are applied to reduce large initial data sets. Basic approaches cull the initial document set with filters such as date restrictions and search terms, de-NISTing and deduplication. Techniques such as email thread suppression, exact duplicate suppression and near duplicate suppression can further reduce a review population while ensuring that all unique content is still reviewed.

While this approach is effective, most resulting review populations still contain subsets of data that can be classified as objectively non-responsive (ONR). These are documents that exhibit features of potentially relevant content (they may contain search terms, for example) but will never be responsive, and should be culled from review populations at the outset of the eDiscovery process. NIST filtering–commonly called deNISTing–is the most common approach to proactively remove documents without evidentiary value from potential review populations. NIST filtering is effective at removing non user generated content, including system and junk files, but its focus is limited to file type and does not account for content. New techniques with a basis in heuristic analysis backed by machine learning aim to do for the identification of ONR content what de-NISTing and File Type Filtering have done for the proactive removal of system and junk files.

Knock Out All that Junk

The implementation of enhanced analytics, in conjunction with traditional techniques, offers more options for organizing and prioritizing reviews. The identification of Graymail, for example, isolates emails to which a custodian has “opted-in” or “subscribed.” High frequency document detection (HFD) identifies items like signature blocks, recurring memos, logos and weekly news letters. Additionally, heuristic analysis can be utilized to implement rules-based tagging that associates specific document properties (such as number of recipients, time sent, subject, etc.) with likely review calls based on historical review patterns. The proactive identification of ONR categories enables review teams to make informed decisions around the organization and prioritization of review populations.

Here’s a closer look:

Graymail is email to which an individual has opted-in or subscribed to receive regular notifications. Unless a company has strong spam or graymail filters, graymail represents up to 10 percent of most review populations–even after the application of date and term culling (e.g., messages from retailers notifying an individual of deals, sales or specials). A multi-pronged approach in this new workflow incorporates text and metadata to graymail within a review population. Many forms of graymail include specific text indicative of email that an individual has chosen to opt-in to receive (e.g., “click this link to unsubscribe”). Additionally, metadata characteristics including items like headers, senders, recipients and other properties are leveraged to distinguish graymail from other potentially relevant email, such as emails that discuss business-related topics. Used together and in combination with other custom classification techniques, graymail filtering quickly identifies and removes these categories of ONR data from a review population.

Identification of high-frequency documents (HFD) populations is another advanced ONR filtering technique that goes beyond traditional approaches. These are high-frequency documents such as signature blocks, recurring memos, logos and weekly newsletters. These items are difficult to remove through deNISTing but should be culled from review sets as they are highly unlikely to contain relevant content.

As with the other filters, the application of rules-based tagging yields a more accurate and focused review set by using big data analytics combined with heuristic analysis to identify specific document properties (e.g., a certain sender and subject information known to be associated with privileged information, or a rule to flag documents with a specific sender domain if messages are frequently sent to more than 100 recipients, establishing a high confidence threshold that these messages are non-responsive).

By accurately identifying ONR documents, you enable a more efficient review: groups of documents identified as ONR can be de-prioritized, sampled against or sent through low-cost review streams, saving additional time and money.

About the Author

Nick Schreiner is Director, Solutions Architecture at Conduent. To learn more about Conduent’s analytics-based privilege review process, please feel free to contact him at