Analytics is a dynamic tool that can dramatically enhance workflow in Relativity and contribute to substantial time and cost savings. This article aims to outline tactics that save time but do not require significant time or resource investments.
The Analytics platform can greatly improve workflow within Relativity. It can be used to increase review efficiency, quickly isolate highly responsive or unresponsive documents and prioritize the review of particularly relevant documents.
The underlying technology behind Relativity Analytics is LSI (Latent Semantic Indexing). This proprietary technology was originally developed for the U.S. Intelligence Community by the Content Analyst Company to offer conceptual analysis and organization for large repositories of unstructured data. In general terms LSI is a math-based approach to text analytics that uses algorithms to organize text into a three-dimensional vector space. The proximity of the text in this space is used to identify conceptual relationships among the indexed terms and documents. It does not rely on external sources to classify the text; instead, it relies solely on the patterns and relationships identified when the data is indexed.
Conceptual Searching (CA Search)
Unlike traditional keyword searching, CA search results will yield conceptually similar documents based on the conceptual correlation of search terms to other indexed terms. CA search will find documents that would not have otherwise been identified using traditional keyword searching. Simply put, concept searching can be used to find documents related to a known term or phrase that do not necessarily contain the exact term or phrase. We have found this type of searching to be a tremendous benefit to our clients, aiding in identifying responsive or privileged documents that would not have been found with keyword searching.
We have used CA search to identify top priority documents to be batched for immediate review. This is particularly useful when dealing with very large data sets. For example, we recently had a project that consisted of over 11 million records with very tight discovery deadlines. Traditional linear document review simply was not an option for this team. With CA search, we were able target the most conceptually relevant documents in the database and create concept-focused priority review batches within several hours of the data being loaded into Relativity.
Finding Similar Documents
The “Find Similar Documents” feature can easily be used on-the-fly in Relativity from both the viewer and text modes. This feature is used to return conceptually correlated documents based on the full text of an entire document. It helps users quickly return a set of highly conceptually similar documents to the key responsive and/or non-responsive documents at hand. We have successfully used this feature to locate groups of non-responsive, potentially privileged and extremely relevant documents, facilitating a more targeted approach to review.
In one of our recent projects, we successfully used the “Find Similar Documents” feature to quickly identify a large number of spam emails prior to batching the documents for review. This process resulted in our client reviewing 30 percent fewer documents and contributed to great time and cost savings.
Conceptual Near-Duplicate Detection
The ability to quickly identify conceptual near-duplicates is now common practice in Relativity databases when Analytics is enabled. Near-duplicate detection is based on conceptual similarity rather than relying on exact text and metadata matches. Near-duplicate groupings can be integrated with advanced searching and automated batching in Relativity, as needed.
In practice, we have found that the identification of near-duplicates is particularly useful when MD5 values are not available to identify exact duplicates. We were able to apply this technology in a recent project on a set of newly loaded third party data. After identifying the conceptual near-duplicates we found that nearly 40 percent of the records had near-duplicates already coded in the database. The client was then able to leverage their prior coding to more efficiently code the new data, resulting in improved efficiency and significant cost savings.
Even in cases where MD5 hash duplicates are available, the addition of conceptual near-duplicates can improve review workflow. Near-duplicates can aid in identifying potentially privileged documents to be flagged for a second-level privileged review. Additionally, they can be useful when spot-checking coding consistency across documents.
Clustering is a mass operation that automatically groups conceptually correlated documents into virtual folders displayed by topic. Users are not required to define a set of exemplar documents upfront. We frequently use clustering in conjunction with batching to generate conceptually similar review batches, aiding in review efficiency.
In a recent project clustering was applied to the full database consisting of around 80,000 records. It took less than one hour for clustering to complete in Relativity. The results allowed our client to quickly determine that around 45 percent of the documents were not relevant or eligible for review. The non-relevant documents were then moved to a secure folder, allowing our client to focus on only the potentially relevant documents. This example clearly demonstrates the vast cost and time saving benefits associated with clustering.
Analytics is a versatile tool that can enhance workflow in Relativity and contribute to substantial time and cost savings. Furthermore, the features outlined above do not require significant time or resource investments. Our clients have had noted success using Analytics to isolate priority documents for immediate review, locate highly responsive or unresponsive data, and improve overall coding efficiency with the use of clustering and near-duplicate identification.