Introductory Guide to eDiscovery
101 Guide: eDiscovery
Organizations produce a deluge of electronic information. Email inboxes are overflowing, hard drives and cloud storage repositories are packed, and social media notifications come at breakneck speed.
For legal professionals who must locate key documents and information for use as possible evidence in litigation and investigations, the explosion of such data has rendered the process of collecting, culling, and reviewing digital information—known as eDiscovery—incredibly costly and time-consuming.
Fortunately, advances in eDiscovery technology are reducing the cost and burden for legal teams to narrow the universe of potentially relevant data. This eDiscovery 101 guide provides an introduction to the fundamentals of eDiscovery and explores challenges and best practices, including considerations for multilingual and cross-border eDiscovery.
What Is eDiscovery?
eDiscovery is the process of identifying, collecting, filtering, and reviewing electronically stored information (ESI) that constitutes potential evidence in legal proceedings and investigations. During eDiscovery, legal professionals, IT personnel, and forensics teams work together to narrow very large datasets into more manageable volumes.
Prior to the digital age, “discovery” in business disputes meant combing through boxes and drawers full of paper documents to locate key information and evidence. Today, since most documents needed for business litigation, arbitration, and investigations are stored in electronic format, traditional discovery is referred to as eDiscovery and is focused on ESI. In the early days of eDiscovery, it was possible for legal professionals to manually review ESI to determine its relevancy to a matter. This was feasible for 25 or 2,500 files, but not so much for 25,000, let alone 250,000 or 2.5 million!
Today, eDiscovery tools powered by analytics and artificial intelligence (AI) significantly streamline the eDiscovery process. But even with modern technology, given the ever-increasing volumes and diversity of ESI, eDiscovery remains an intensive endeavor that requires human expertise, effective communication, and agile decision-making at every step of the way.
What Is ESI?
ESI stands for electronically stored information. This includes a wide variety of digital assets, including emails; e-documents (Word, PPT, and Excel files); image, audio, and video files; mobile device data (e.g., chat programs and text messages); cloud-based applications; website content; and social media postings.
What Is the EDRM Framework for eDiscovery?
Introduced in 2005, the Electronic Discovery Reference Model (EDRM) is a visual representation of the complete eDiscovery lifecycle, which is widely referred to by legal teams globally. The EDRM breaks down the eDiscovery process into nine steps, although not every step is relevant to every matter. The EDRM isn’t always followed sequentially, and steps may be repeated depending on the project scope and cadence.
What Are the Steps in the EDRM?
What Is Early Case Assessment?
Early case assessment (ECA) in a workflow is typically employed during the “processing” stage above to identify key documents and information that the legal team uses to advise its client on case strategy and merits. ECA can help predict the cost of a case as well as likely exposure, which helps teams create realistic case strategies and budgets for the full eDiscovery process.
ECA also uses advanced methods to filter out non-relevant data early in the eDiscovery process, thereby narrowing large datasets. Beyond the standard culling methods of filtering by date and search terms, and deduplicating and de-NISTing files (i.e., removing system files and other non-user-generated files), ECA leverages the following tools to further cull data during processing:
- Filetype filters (e.g., removing calendar entries or video files when not relevant to the matter)
- Domain/email handle filters (e.g., removing SPAM and industry newsletters that hit on search terms but are not relevant to the underlying dispute)
- Concept clustering (i.e., organizing the files by content to identify and remove groups of search term false positives)
Generally speaking, standard ESI filtering results in an 80% reduction in the volume of ESI. Using ECA advanced filtering, that number can top 90%, meaning only 10% of your ESI moves to the document review phase, rather than 20%.
Digital Reef is an ECA and eDiscovery platform from TransPerfect Legal. Digital Reef ingests, culls, analyzes, and exports datasets. It can be installed securely onsite or hosted in a TransPerfect Legal data center. Capable of processing up to 17 terabytes of data in 24 hours and managing multi-language data, Digital Reef can enable you to process, investigate, and preview your data 40% faster while reducing datasets by more than 90%. Attorneys and legal processing service providers use Digital Reef’s built-in document viewer and coding interface to conduct investigations.
For larger matters that require linear or a technology-assisted review (TAR)-based approach, data is culled in Digital Reef and then exported in required output formats to easily load into your preferred review solution. Digital Reef integrates with numerous technologies, including the market-leading Relativity document hosting and review platform. The team at TransPerfect Legal builds proprietary add-ons for Relativity.
ECA in Action
In a construction matter, a case-relevant search term entered into an ECA tool produced 300,000 emails. However, many of these documents were for construction projects that were not relevant to the matter under consideration. Emails related to these extraneous projects were then excluded from the dataset, which halved the number of search hits. Further analysis removed irrelevant email domains, for an overall reduction of 200,000 documents—leaving only 100,000 documents to review. Spending a few hours to run through this ECA exercise saved more than 4,000 hours of document review, which saved hundreds of thousands of dollars. While the technology allowed for rapid data exclusion, it took the experienced, skilled eyes of the human user to unlock the efficiencies of the ECA tool.
What Happens During Document Review and Analysis?
Document review and analysis is the most time-consuming and costly part of the eDiscovery process. It involves large teams of lawyers examining the documents that have survived the culling process to determine their relevance to the case. During document review, data is evaluated for relevance, privilege, confidentiality, and privacy. Technology-assisted review (TAR) typically happens during document review.
Before undergoing document review, it’s important to:
As with early case assessment, technology can streamline the process with document hosting, review, and analysis. Examples are Relativity and TransPerfect Legal’s newly released Reef Review. Reef Review offers a robust suite of analytics and AI features including continuous active learning, redaction, near-duplicate analysis, and daily review reporting.
How Is AI Streamlining eDiscovery?
AI has become an increasingly integral part of eDiscovery over the last decade. AI uses data-mining techniques that can narrow the set of documents sent for review, saving both time and money. AI-powered eDiscovery software leverages technologies such as machine learning (ML), natural language processing (NLP), and, more recently, generative AI. Here are some of the ways AI can automate and streamline aspects of eDiscovery:
Technology-Assisted Review (TAR)
TAR (sometimes called predictive coding) helps prioritize and identify potentially relevant documents for review by learning from human reviewers' feedback. Over the past 10 years, TAR has become the most popular and powerful AI tool in the eDiscovery toolbox.
Concept Clustering
After files have been processed, concept clustering groups the remaining data based on concepts, topics, or ideas. This allows reviewers to examine documents based on similarity, remove groups of search-term “mishits,” and generally focus the review on more relevant content first.
Conceptual Search
Beyond just keyword searches, conceptual searching enables searching based on the context and meaning of the content, thereby improving the accuracy of search results.
Generative AI
Looking ahead, generative AI (e.g., ChatGPT) is poised to take on a more prominent role in eDiscovery. Take, for example, the “document dump” that sometimes happens the night before a deposition. Generative AI may be used to quickly summarize documents or create a chronology, making a previously untenable eleventh-hour review possible. Likewise, generative AI can provide useful features, such as “ask my documents a question,” as well as potentially replace TAR as the key driver of document review itself.
Language Identification and Machine Translation
Some AI-powered eDiscovery platforms have language identification capabilities that can automatically detect and tag the language of documents. Likewise, advanced machine translation engines are often built directly into eDiscovery platforms for multilingual datasets.
Named Entity Recognition (NER)
AI-based NER can identify and extract entities such as names, locations, and dates from documents.
Sentiment Analysis
Sentiment analysis tools can help identify the tone and emotional context of communications.
AI in Action
Let’s say you want to find all documents that include references to “X” and any documents that include negative sentiments around “X.” Suppose you have 75,000 possible documents to review. Using an AI tool, you create your instructions, submit them, and then wait for documents to be reviewed. Each document gets classified in one of four ways: relevant to the issue, not relevant to the issue, needs further review, or has a technical issue. The tool will tell you which issue an item is relevant for (reference to X and/or negative sentiment about X). Now you have a much smaller pool of documents to review.
What Are Some Considerations for Multilanguage and Cross-Border eDiscovery?
The complexities of language and translation can make it difficult to understand and analyze electronic data in foreign languages. Cultural differences can also impact the interpretation and analysis of data, which can affect the accuracy and effectiveness of an investigation.
Multilanguage eDiscovery tools play an important role when electronic data is in more than one language. They use natural language processing (NLP) and machine learning algorithms to help investigators track, understand, translate, and analyze electronic data in multiple languages.
Cross-border investigations can be tricky, as laws and regulations regarding the collection, analysis, and transfer of electronic data vary by country. Investigators must have an awareness of data protection laws in each jurisdiction to ensure proper legal compliance. Multilingual eDiscovery tools offer legal and regulatory guidance, helping to ensure that no laws are broken within the jurisdictions in which the investigation is being conducted.
Data Privacy Compliance During eDiscovery
Data privacy and protection are paramount to maintaining the integrity of case materials. Companies need to keep data privacy laws and regulations front and center during eDiscovery. This means adhering to rules under such acts as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and the European Union Privacy Directives (EUPD) for every jurisdiction in which they do business.
Companies need to be mindful of the way they identify and handle personal health information (PHI) and personally identifiable information (PII). Identifying where PHI and PII appear in datasets will keep you from triggering data privacy concerns. Advanced data mining technology can look through data and find people, places, and organizations and then appropriately redact that data. Note that the GDPR has more stringent rules about what constitutes personal information, including sexual orientation and political beliefs.
Best Practices in Data Privacy
What Are Some Challenges Associated with eDiscovery?