Law School Today – Not for the Faint at Heart……or Wallet

Written by Howard Reissner, CEO at Planet Data

A recent article in the New York Times addressed Brooklyn Law School’s decision to substantially reduce the tuition for its next entering class. Brooklyn Law acknowledged that it was responding to the market pressures of a continuing decline in the number of applicants to law school in the United States. This law school is one of the first to take such a radical step in an effort to maintain its academic standards and to enable its graduates to face a somewhat reduced level of potential debt in an extremely challenging legal job market.  Quite possibly the next course of action by law schools will be reductions in class sizes. Clearly, some of the best and brightest are less enamored of a legal career than just a few years ago, and are looking elsewhere for financial and spiritual fulfillment.

These dynamics have left all but the top fifteen law schools scrambling for the shrinking pool of highly qualified applicants. As such, in a reversal of fortune, the schools are now courting potential students with financial aid packages approaching the signing bonuses for star athletes. However, most schools do not have the ability to exist without high tuition fees. Alumni contributions and endowments cannot sustain this model long term.  A contributing factor to this situation is that too many new law schools have opened their doors over the past thirty years, significantly increasing the job seeking pool. Clearly, the law schools that thrive over the next decade will either have to be in the top fifteen, or re-think how they train their students and prepare them for this altered legal environment.

Until recently it was generally believed that the enormous rise in law school tuition over the past decades would be accompanied by high paying “big law” jobs, enabling student debt to be eventually repaid. However, since 2008, the number of such positions has declined dramatically (but not tuition or debt), and it is unlikely that we will return to that economic model. Why? Because the recession of 2008 begat the necessity for attorneys to become more efficient due to client’s smaller budgets and the transparency afforded to the legal process by technology enhancements. As such, large firm clients would no longer pay for young attorneys to be trained on “their dime”. The result has been a dramatic reduction in the number of “big law” positions available to graduates that will allow them to pay off high loan balances.  Big law firms partners can no longer leverage that economic model with impunity; witness the number of firms that have failed or are in economic duress.

So, faced with these macro and micro issues what actions should law firms take to remain competitive and what should law students and junior attorneys do to enhance their career prospects?

Law firm clients are demanding more services for less billing, are better able to audit the work product and time required to complete the services, and are aware that technology is driving down the necessity to create legal documents from scratch for many “commoditized” types of assignments. Law firms need to implement technology and develop work flows that are sensitive to the present economic environment. Even basic document review, which has been a reliable generator of revenue for many firms, is subject to technology enhancements that are sharply reducing the number (and hourly rates) of attorneys required for a case.

Junior attorneys will have better job prospects if they enter the work force with a basic level of technological competence and obtain practical skills through prior full-time work experience, internships, clinics, and summer employment.

While clients will no longer pay for associates to learn to practice law, domain expertise will still be important in the future. If possible, junior attorneys should utilize knowledge from other domains (and prior careers) to create a niche in their skill sets.  A few examples would include technology innovation (business and patent), entrepreneurship, healthcare, biotech, information management, communications, government contracts, privacy, security, and transnational issues. These types of substantive domains are less likely to be impacted by software that can replace lawyers performing more basic legal functions. I have seen demonstrations of contract document building software that can substantially reduce the time required to draft, review and revise both simple and more complex agreements. This is a good thing for clients, not so much for attorney billings.

Not to be under-appreciated are the practical and ethical requirement for attorneys to have basic skills in technology and communication. These core competencies will allow them to be efficient and more valuable to their firms and clients. Pursuant to the ABA’s Model Rules of Professional Conduct, a lawyer should keep abreast of changes in the law and its practice, “including the benefits and risks associated with relevant technology”. Attorneys are daily presented with issues pertaining to social media, communications, security and privacy. Knowledge of how technology interacts with these areas is critical for future success in the legal profession.

To summarize, the practice of law will not disappear anytime soon. However, it is likely that the unbridled growth in the number of new attorneys will rapidly taper off, and the promise of the golden “big law firm” ticket to prestige and riches will be less of a reality for many. The days of just “studying hard,” “showing up,” and “doing well” and are over. Nonetheless, there will still be many opportunities for professional growth and achievement for those who grasp the new legal paradigm and align their skills and expectations accordingly.


Things We Can’t Wait to do at Legal Tech 2014

shutterstock_147839372Legal Tech is always a great time to see clients that don’t get to the city much, and friends you haven’t spoken to in a while.  At Planet Data we also look forward showing off our newest Exego features which this year includes a review module.  We’ll be located in Booth 2123 and are ready to show you a 10 minute demo of Exego Early Case Assessment and Review.  Please stop by to see one and we’ll give you a $10 Starbucks Card*.  We also have a private suite if you prefer a more in-depth demonstration.  Please contact Laura Marques to set up a time.  We look forward to meeting you.

There is, however, one other thing that we are particularly looking forward to – and that’s the invite-only Cowen Group luncheon on Project Management that we are proudly sponsoring, The Evolving Role of the Legal and eDiscovery Project Manager. 

Planet Data COO, David Cochran and The Cowen Group Managing Partner, David Cowen got together to chat about some of the hottest topics that will be covered.  The core group of this panel has had several sessions together since the August 2013 ILTA conference to develop a practical end result.  A strong mixture of large and small law firms, corporations and vendors discussed every job description under the sun to come up with the key line items for the ideal project manager candidate.  This luncheon is not only to show gratitude for all of the participant’s time, but will end with an incredible action item – a real interview guide for hiring the kind of project manager your firm needs.

What does a Project Manager actually need to know? 

Cochran:  2013 was definitely the year of the Project Manager discussion.  This position continues to be the critical foundation for any firm, corporate legal department or supplier and identifying qualified and experienced Project Managers is a challenge.  The panel identified several key attributes and collaborated on what is important for a successful Project Manager.

Cowen:  The role of the eDiscovery Project Manager is rapidly evolving and becoming more advisory and consultative at many law firms, corporations and vendors across the country.  It’s not enough just to know the law, the technology or the workflow.  Today – top talent will be judged by their Business, Social and Political IQ along with the expected legal and tech knowledge.

Challenge: Finding the right candidate.

Cochran:  With complex “Big Data” requirements, (including, success or failure of e-discovery collection, review and production, information governance implications, managed review requirements, focused data collections and more) being the “norm” now, how do you qualify candidates?

Cowen:  Career opportunities are always a hot topic at LegalTech and I expect this year will be no exception.  Law firms, vendors, corporations and consulting firms each continue to invest in critical talent and the demand for experienced eDiscovery Project Managers, consultants and attorneys has accelerated in recent months as the vendor landscape gets more competitive and corporate clients continue to raise expectations.

Hiring and Training

Cochran:  The skill set requirements continue to grow year after year, and without sharp individuals in place to understand, facilitate and execute on these ever-changing requirements, the lawyer would be in dire straits.  We consider proper training, re-training and support mission critical, and do not take it lightly.

Cowen:  In 2014, the ability to hire, train and retain this unique talent will prove more critical to success than ever before.  The difference between those that will lead the pack and all others over the next two years will be more about the talent, then about tools and technology.  The biggest difference will be made by the talent that innovates, disrupts, and creates better ways to use those tools, thereby creating productivity and efficiency, which leads to increased, market share, revenue and profit.

In Closing

Cochran:  Whether it’s consulting on a project strategy with a client, working with a client on collecting data, ensuring that data is processed correctly, working with experienced suppliers, or providing their overall expertise to the lawyers, these individuals are professionals and the bedrock of any project.

Cowen:  I predict the winners will be those organizations with the best hiring plan, career path, retention strategy and leadership. We hope this skills assessment checklist helps you in evaluating your current talent strategy and in making better hiring decisions in order to become that organization.

*While supplies last.  Distribution at the discretion of Planet Data.

The Technology Bar Has Risen and Attorney Competency Must Move to Meet It

The Essentials of Understanding Technology on a Practical and Ethical Level


What is one of the most important issues to affect the legal industry over the past decade? Yes, changes to the FRCP and the 2008 recession would certainly rank highly, but more fundamentally, the explosion of technologically enabled methods of data creation, and the practical implications, and communication of these developments on the practice of law.  Over the past year or two there has been a groundswell of attention pointed directly at attorney technology knowledge and skills.  More specifically, the focus is on the higher standards now required for professional competency. Or to put it another way, there is an increasing focus on just how lacking many attorneys are in deploying technology skills.  Not just with the more specialized technical practice areas (such as IP or litigation), but even in the fundamental skills required for day to day efficient practice, such as basic competencies in the use of the Microsoft Office suite of software.

To put this in proper perspective; this applies to all of us. With few exceptions, lawyers at all levels of experience will to some degree be affected by their ability to demonstrate basic technology skills. For example, an attorney often needs to request and receive information from a client. Since data and information is now created and stored in so many different forms and places, a basic understanding of these tools is required just to posit the questions correctly. If you are involved in litigation, due diligence, investigatory or compliance matters, there are fundamental skills required just to understand the key issues and facts of the matter, and to properly utilize this information for your client’s benefit. To clarify, this does not mean that you need to become an expert in computer programming and IT systems. What it does mean is that you need to develop a fundamental understanding of how to use and implement these technologies and systems so that you will be able to identify the key issues and appropriate experts to support your efforts.  The incentives? In addition to fulfilling your ethical requirements, going forward, technology competency may distinguish between those who prosper, and those who do not.

The legal services industry has fundamentally shifted from an era of minimal client knowledge about legal work requirements, quality, or billing standards, to one of increasing transparency; even prior to project engagement.  The industry has become the most competitive we have ever seen, and that trend is likely to continue. There are several major factors that are primarily causing this shift, of which the most important may be that for many types of non-complex issues legal services are becoming commoditized. Nontraditional service providers are offering legal services at lower costs, and the billable hour is no longer routinely accepted; with clients increasingly demanding predictability in legal fees. Let’s face it, one of the reasons that routine legal matters are becoming viewed as commodities is directly due to the advances in technology that allow ideas and documents to be efficiently repurposed, thereby reducing the need for each document to be an original piece of legal work.

All of us are painfully aware of the impact that technology is having on our daily existence. We have moved in a very brief historical span from writing on yellow pads, sending snail mail, and land line telephones to an ever changing environment of personal computers, laptops, email (both business and personal), smartphones, tablets, iPads, texting, Twitter, Facebook, Vine and others that haven’t even debuted yet.  It seems as if there is a daily creation of new social media methods of communication, and the manner in which the information is stored. Proliferation of technology, data creation and information transfer is accelerating at ever increasing speeds.   I am fortunate enough to have a 14 year old consultant at home. She recently brought me up to speed on which social media technologies are gaining traction with her age group.  But what do you do if you don’t have a 14 year old?

Although many of these amazing innovations have been incorporated into the basic fabric of our economy, the practice of law has not changed dramatically over the past century, and continues to be a lagging sector in the implementation of technology based efficiencies.

Lawyers, out of practical and economic necessity, are being forced to rapidly adapt to this growing paradigm of massive information and data creation and the expectancies of a marketplace that places a premium on efficiency. Unless you have a unique set of professional skills that are highly differentiating, both senior and more junior attorneys will be evaluated by clients and colleagues in some measure, by their abilities to master and effectively deploy technology skills as one of the core components in their toolbox. In the most practical terms, for an experienced lawyer, it may mean the difference between obtaining a new client, or in the retention of an existing one. For a junior attorney, it may be a factor in career advancement. For a new lawyer, it may be the differentiator in getting that first job.  Technology is changing the practice of law, and in a profession that is seeing minimal growth, it is becoming another method to screen and evaluate the competitive pool.

Some of us probably went to law school (with the exception of course, of IP attorneys) believing that the practice of law would not require a significant amount of technical knowledge or expertise. But in reality, there are now basic levels of technical knowledge required of all attorneys. While there are specific types of domain expertise demanded of litigators, for example, there are general levels of technical competency that all of you need to attain in the second decade of this century. There are few practice areas today that do NOT encompass the need for these minimum levels of knowledge. Compliance, contracts, security, privacy, due diligence, investigations, human resources, etc. now all require attorney knowledge of how their clients manage their information systems and where they store and share their data.

What has changed the most over the past few years is the expectation of clients, who are demanding that the practice of law incorporate these productivity enhancement tools. There is a new focus on “efficient = proficient”. Our difficult economic climate has necessitated corporate clients to demand competency in practice management, and in fact, are more frequently retaining consultants to analyze the comparative value obtained from their outside counsel.

The knowledge of basic technology concepts and a good facility with baseline technical skills is no longer optional; it is mandatory for attorneys that desire to prosper in the decade to come.

In 2012 the ABA made amendments (and revisions to the comments) of its Model Rules of Professional Conduct, reflecting the practical realities that technological competency is no longer just a desirable skill for attorneys, but in fact an ethical obligation to their clients. We are likely at the beginning of a period where clients may challenge negative outcomes of matters based upon their counsel not exercising the expected levels of technical competencies. These types of situations will likely not be helpful for practice development.

Whether you are a senior attorney, in mid-career, or just starting out, these standards have risen and will continue to evolve.

From Music to Mars…To ILTA

From Music Indsustry to e-Discovery

by Laura Marques
VP, Marketing and Communications
Planet Data

Five years ago I was in the music business, and never thought I would ever be anywhere else. However, that industry was in worse shape than most at the time and I knew I had to make a change. So I did…to the Legal Technology sector. Natural progression one would think. NOT. I honestly had never heard the word algorithm used in a meeting before.

But it’s a platform, not a punk band

So how do you market a platform instead of a person? It was easier than I thought because after all, what you really need to work with is something cool, and that I do. For one thing our platform – Exego – is a game changer. It’s a processing workhorse, with an efficient ECA solution, flexible workflows and a brand new review feature. Super smart and super simple. Exego combines the best of ECA and Review in one place, which helps our clients save money and do better work. What’s not cool about that?

A showcase is a showcase is a showcase

When you are working on something new and amazing, you can’t wait to showcase it. At LegalTech we conducted some sneak previews of Exego Review and shared some behind-the-scenes work with our clients. We took notes on what was most important to them, continued development and set our sights on ILTA.

Fast forward through the longest winter EVER, and we are now ready to officially introduce Exego Review. Planet Data representatives will be on hand to demo both Exego ECA and Exego Review for you.

Main stage, side stage, back stage

We’ll have Planet Data representatives all over Caesar’s – at Booth 524, in Forum Ballroom 3 and at the Relaxation Station where you can have a massage and a few minutes with no sales pitches – unless of course you want one.

Pass by our booth 524. You’ll get to meet Lori, one of our Project Managers, or Steven, one of our Review Experts. I’ll be there too. Drop your card in the fishbowl, and you’ll be entered to win a $250 Amex Gift card.

Then visit us in Forum Ballroom 3 for a little more interaction. Meet more of our staff, like Zoltan, Planet Data President and Dave, our COO. Mike, our CTO, company wizard and Exego creator, will have a short presentation of Exego Review ready to go whenever you have time to stop by. Brad, our VP of Consulting Services and Adam, our Regional VP of Business Development will be there to help you test-drive Exego ECA so you can see how simple it is to use. And, if you sit with us for a 20-minute Exego Review demo we’ll give you a real redeemable Caesars casino chip (while supplies last and at the discretion of Planet Data). Oh, and you can enter to win a $250 Amex gift card here, too.

As you walk, walk, walk through the conference center be sure to find our Planet Data Relaxation Station near the Augustus Ballroom. After all, Planet Data’s Exego delivers stress-free eDiscovery; so why not deliver a little stress-free convention time. And, like the booth and demo room you can enter to win a $250 Amex gift card here, too.


And finally, we’ll be hosting Happy Hour in Forum Ballroom 3 on Tuesday @ 4:30, and sponsoring all the meals and breaks on Wednesday. Again, all our people will be mingling around so please introduce yourself. We all look forward to meeting you.

Thank you, good night!

Thanks for your time today. I look forward to seeing you in Vegas. In my next blog before LegalTech, I’ll tell you how the New York Hilton was once the home of a hugely popular 1980’s music convention.

Until then, yours in marketing….
Email me

Standards for Competency in eDiscovery on the Rise – What’s Your Best Defense?

By, Howard Reissner, Esq., CEO Planet Data

The recently issued opinion in Branhaven, LLC v. Beeftek, Inc.  et.  al., 2013 WL 388429 (D. Md. Jan 4, 2013) highlights the requirements for attorneys to continuously keep abreast of changes in professional standards of competence in their fields of practice. The bar for minimum competency is rapidly rising in the e-discovery universe.  A significant percentage of federal judges have become well enough educated in this area to confidently determine which attorneys that appear in their court are both complying with the FRCP and have adequately investigated their clients data systems and infrastructure.

In “Branhaven” the court sanctioned both the client and counsel under FRCP 26 (g) for the incorrect certification of a signed response to a request for production. In fact, counsel had as of the date of the certification not made a reasonable effort to assure that the client had provided all of the information and documents available to him that are responsive to the discovery demand, yet he represented that he had done so.  The decision noted that pursuant to Rule 26 (g) (3) “if a certification violates this rule without substantial justification, the court….must impose an appropriate sanction on the signer of the party on whose behalf the signer was acting or both…”

In a second recent Federal Court decision, In re Delta/Air Tran Baggage Fee Antitrust Litigation., 846 F. Supp. 2nd 1335 (N.D. Ga. 2012), Delta Airlines was sanctioned pursuant to 26 (g) for failure to make sure that all relevant hard drives and other ESI were searched after making many assurances to the court that a reasonable inquiry had been made.

“Branhaven” and “In re Delta” are another clear signal to practicing attorneys that they will be measured against a higher standard of professional competence and scrutiny of their behavior by a judiciary that has become much more educated about technology and e-discovery over the past few years.

Along the same line of reasoning, counsel may not escape potential negative consequences due to having relied upon an outside vendor to manage part of the discovery process. In Brookfield Asset Management, Inc. v. AIG Products Corp., 2013 U.S. Dist. LEXIS 29543 (S.D.N.Y. Jan. 7, 2013); the defendant was allowed to claw back documents that had been inadvertently produced because a FRE 502 (d) agreement was in place. However, due to vendor error, the damage was done. The redacted text was visible to the plaintiff when viewing the metadata. I believe the lesson here is that an attorney should be confident that they have the knowledge to retain vendors that have significant professional expertise, utilize high quality software, and have developed work-flows and quality controls to minimize these types of painful errors.  See also: Peerless Industries, Inc.  v. Crimson AV, LLC., 2013 U.S. Dist. LEXIS 2985 (N.D. Ill. Jan. 8, 2013), where counsel was held responsible for the incomplete collection of data by a vendor. 

As a reminder to in-house counsel that they are responsible for monitoring the actions of their outside law firms, in Coquina Investments v. Rothstein, 2012 U.S. Dist. LEXIS 108712 (S.D. Fla. Aug. 3, 2012) the court imposed sanctions under Rule 37 against both the defendant and outside counsel. The findings of fact in the judge’s order will likely have substantial negative impacts for the defendant in future litigations brought by other plaintiffs.

As a participant at many legal educational forums over the past year it has become apparent to me that the federal judiciary has significantly enhanced their expertise in many of these technical areas; perhaps well beyond that of many of the lawyers that appear before them. I believe that it is good advice to encourage litigators who are still unfamiliar with their fundamental obligations in e-discovery to quickly get themselves up to professional standards. It should be apparent today that a large percentage of litigation will include some aspect of ESI. Lack of technical knowledge or the inability to employ others who do is no longer an excuse for discovery lapses.  In addition to the various types of sanctions and malpractice actions that can result from these professional lapses are the real possibility of incurring disciplinary proceedings from the state or federal Bar. See:  In re Disciplinary Proceedings Against McGrath, 174 Wash. 2nd. 813, 280 P. 3d 1091 (2012).

Although there has been a steady climb up the technology learning curve for many federal judges, there still is a wide disparity in expertise within the group. As such, an attorney is well advised to spend some time researching a particular jurist’s level of e-discovery knowledge and the professional standards that have been imposed in their courtroom.  A review of the judge’s prior published opinions (and other precedent from the jurisdiction) should be a mandatory requirement. Over the past two years a substantial number of opinions have addressed attorney cooperation, data preservation, litigation holds, processing, searching, technology assisted review (TAR), and production. 

So, what actions should an attorney take prior to commencing a case before a judge for the initial encounter? At the most basic level, all of the judges published opinions that include discovery issues should be read. In addition, any speeches, articles or other publications authored by the judge should be reviewed. Does the judge attend CLE and other professional conferences that address e-discovery? It would be prudent to seek out other counsel who had appeared before that court to seek out their experiences with that judge. Inquire as to the level of the judge’s technological savvy.  Does the judge become directly involved in discovery disputes or does she keep a “hands off” approach and let the parties work it out between themselves? Is the judge a proponent of TAR and has she allowed or mandated its use in prior cases?

So, what steps can an attorney take to get off on the right foot with the judge? First, cooperate with the opposing counsel from the outset as much as is practicable. Recently, the judiciary has taken a more active role in encouraging cooperation between counsels; see: Carrillo v. Schneider Logistics, Inc., 2012 WL 4791614 (C.D. Cal. Oct. 5, 2012), where the court awarded monetary sanctions for defendants repeated failures to cooperate in the discovery process. Also see: Easley v. Lennar Corp., 2012 WL 2244206 (D. Nev. June 15, 2012), where the court urged direct personal contact between counsel prior to filing motions to compel discovery. Finally, see: Kleen Products LLC v. Packaging Corp. of Am., 2012 WL 449865 (N.D. Ill. Sept. 28, 2012), where the judge commended the lawyers and their clients for conducting discover in a collaborative manner.  

Judges have made it clear that they do not want to be involved in “ministerial” discovery disputes. Attorneys who appear to be taking the extra steps to avoid these types of conflicts will have elevated themselves in the mind of the judge.

Secondly, take the effort to carefully consider your discovery requests, both as to scope and form of production. As the raw size of data continues to accelerate, the issue of proportionality has taken a more central role, see:  Boeynaems v. LA Fitness Int.’l, 2012 U.S. Dist. LEXIS 115272 (E.D. Pa. Aug. 16, 2012), ordering Plaintiffs to pay for additional discovery costs prior to class certification, and Juster Acquisition Co. v. North Hudson Sewerage Authority, 2013 U.S. Dist. LEXIS 18372 (D.N.J. Feb. 11, 2013), where the court granted plaintiff’s discovery request as being reasonable and not creating a cost burden that outweighed the benefits of defendants compliance as considered within the scope of the case.   These decisions emphasize that judges want cases to be decided on the merits and that discovery requests should take into consideration the value of the cases and issues under dispute.

Finally, if the judge is not as sophisticated in the technology issues as you would prefer, then provide educational resources and professional support that will validate your positions.

Technology Assisted Review is NOT New … Just Improved


by Kevin Leser, VP, Project Management, Planet Data

Though the terminology is perceived as new – “Technology Assisted Review” or TAR – there’s really little or nothing new about it. Anyone who has been using such search tools as Concordance and Summation since the 1990’s can attest that it’s just new verbiage wrapped around recent legal work-flow enhancements. Lawyers have been applying keyword searching to screen discovery files since the 1970’s when three companies, Aspen Systems, Informatics and Control Data (later rebranded as Quorum), were the first to point search engines at huge volumes of discovery files in an effort to whittle the mass down to a manageable pile of business records potentially relevant to a litigation.

If you’re under forty-five you probably won’t recognize those three names or, at least, remember much about their role as the founding elements of the litigation support industry. Their operations were all housed in the Maryland suburbs of DC. They all had mainframes running inverted file text search engines. Aspen used AspenSearch, its own creation, and likely the first of these unique tools. Informatics used Inquire, and Quorum used Basis. These all worked in a similar manner. The text of a document was broken down into unique words, which were assigned word IDs, which were, in turn, strung together in huge blocks of bits-and-bytes to make the contents yield to a search. If the search was for “Man bites dog”, the results would return exactly that group of words, and none other. It had to be a man not a woman, boy or girl; had to be a dog, not a poodle or a mutt; had to be a bite not a nip, gnaw or nibble.

These tools were all fed by armies of document coders and information analysts who sat arranged in rows of tables with blank document control forms to their right and stacks of Bates numbered paper to their left. In the days prior to electronic discovery (or even basic scanning and OCR), litigation support databases were built one hand-printed document control form at a time. College students looked at each document and painstaking recorded the author, addressee, copyee, date, document type, document title. They also noted specific conditions such as the presence of marginal notes, illegible scrawl, and even ink blots caused when some scribe tipped over his ink well while penning a document. Yes, this last bit is a bit of an exaggeration, but the point is that today’s catchphrase TAR, describes a process that, at a minimum, dates back to the Gerald Ford era.

So, if TAR is not actually a contemporary notion, what’s new enough in this realm to compel me to pen this article?
What’s new is that now concept engines are reshaping the completeness and accuracy curve of a search result. Basic keyword search tools are notoriously ineffective. The methodologies deployed in the 1970’s, and still largely active today, remain at the core of Concordance, Summation and dtSearch. The efficacy of the searches relies almost totally on the quality of the keyword list. However, the English language, and most any language, doesn’t innately lend itself to being probed with precision by simple keyword lists, even those that smart attorneys and litigation support professionals labor over for hours. There’s a famous (and still relevant) study done by David Blair and M. E. Maron in the early 1980’s, and published in the March 1985 issue of Computing Practices Magazine. Although the article contains myriad charts and equations as support for its conclusions, the methodology of the study and its results were elegantly simple. They took a ton of paper documents, had them accurately keyed into machine readable form and comprehensively indexed in e IBM’s text search tool, STAIRS, which was considered relatively powerful at the time.

The Blair and Maron team (including lawyers and paralegals) took this database and a set of keyword searches they expended an inordinate amount of time perfecting, and ran them several times, tweaking the results with each iteration until they were fairly confident that they found the vast majority of the relevant documents. Sort of sounds familiar, huh? Sounds similar to one of the keyword search driven reviews we supported over the last couple of months.

The team was then charged with manually reviewing all the original paper documents and flagging those they felt were relevant. When they compared the respective stacks, the keyword searches identified less than 20 percent of the documents determined to be relevant during the ‘Big Read’ of the roughly 350,000 pages that made up the original input to the database For those of us who had been building litigation support databases since the 1970’s, there were no surprises here, only validation of what we had learned in the early years of using those first-generation tools. Paraphrasing a current political truism: It’s the language, stupid! There are many, many ways to say the same thing, particularly in English, and having a Roget’s Thesaurus at your side when you’re probing a database doesn’t help much.

Back then we got around these limitations by developing retrieval thesauri and taxonomies for specific litigations, which enable document analysts to apply codes that reflected a document’s content. The distinction here is that legal issues can change over the course of a matter, but document content does not.

These content categorization aids were really elegantly designed tools that let an analyst objectively index each document so an attorney could then plug in a search code and more readily locate the documents that might be relevant to a production request. The only problem with something being elegant is that it’s usually also really, really expensive to develop and then apply; like tens of thousands of dollars to design, and $15 to $20 per document to apply, all in 1980’s dollars. But if you needed your retrieval to be comprehensive and accurate, you paid the freight.

These tools were often applied in asbestos and other mass torts products cases in the 80’s, where the accuracy and completeness of the retrieval and production process were critical to a solid defense. In the absence of the content indexing technology that has emerged over the last few years, the issue of accessing content was as relevant then to paper documents as it is to today’s electronic documents. Keyword search tools simply find documents, with an emphasis on simply. They don’t hunt them down, sift them and selectively offer morsels up for qualified review. Content engines do that.

I’m not going to drill down into the pros and cons of competing content indexing technologies. My company uses one content engine and other people in other litigation support firms use it too, and still others use different, but not altogether dissimilar tools. There are similar-but-different keyword search tools, and arguably, the same can be said of them. That’s a sidebar we can leave to the sales guys and the information scientists to postulate on. But, in a nutshell, content engines create intricate matrices of, well, content. You know, the stuff you might have said or written yourself, but maybe a little differently or a lot differently, using different language, maybe even a different premise or set of facts, but which when you read it you recognize it, and say, “hey, that’s what I’m looking for”. Or, in our business, maybe it’s “yikes, wish I hadn’t found that”.

The reality is that we think in concepts, not in keywords or phrases. We’re assaulted daily by politicians and advertisers who would like to think we function in a world dominated by buzzwords. Simply put, we’re conceptual beings and constantly filter input to get at the core of what we’re looking for or need at any given moment. This axiom holds true whether it’s where you can find the best ribeye steak or where that pesky little document that can help you or hurt you is hiding. That’s why the participants in the Blair and Maron study found the missing 80 percent of the relevant information in the test population when they serially reviewed all the 350,000 pieces of paper, page by page by page.

In the late 1980’s, Bell Labs applied something called Singular Value Decomposition (SVD) to textual material in an effort to replicate this core human capability inside the circuits of one of their neat Unix-driven boxes. SVD was, and is used widely in statistical applications. Basically, it’s a way of building a two-dimension matrix of something, such as the universe. In the early to mid-1990’s, related techniques were deployed in text search applications like Excalibur to look for patterns in documents, and in effect, pump up the volume of hits returned by a search. More is usually better, but for those of us who were forced to mull over the results, more was often just more. It’s all about the”accuracy versus completeness” curve, today known as “Recall and Precision”. Pattern recognition techniques pushed up the numbers of documents retrieved, but their relevance was often disproportionate to that extra volume. You looked at more documents, found more that were relevant, but only marginally more relevant. The added cost of reviewing those additional documents was often disproportionate to their value.

Then Latent Semantic Indexing (LSI) arrived during the first decade of this century out of that sylvan looking office park in Langley or those 10 story buildings at Ft. Mead, topped by arrays of satellite dishes. LSI was derived from the SVD model and is used widely by intelligence agencies to constantly screen the terabytes of data they grab hourly from “The Cloud”. Forgiving the pun, the results are spooky. You can input a chunk of verbatim text from one document and get back lots of verbatim text from other documents that virtually align conceptually without any readily apparent shared text.

For example, consider the following paragraph:

‘We would like to actively promote people into positions of power and influence to effect change in the legislative and regulatory process. This involves using lobbyists and personal contacts to move Congressional and Senatorial committees to change the regulations and laws to benefit Enron. In particular our connections with the George Bush Administration, office of the President and Vice President as well as congressman, senators and agency heads should be used to get policies changed on our behalf.’

When this exemplar text is used to probe a LSI-enabled database containing the EDRM sample set of Enron files, the content engine hits on the following paragraph in an internal Enron memo:

‘This memo is a follow up to your phone conversation with Roger Enrico regarding Enron contributing $250,000 to The President’s Dinner. The President’s Dinner is a joint fundraising effort by the National Republican Congressional Committee (NRCC) and the National Republican Senatorial Committee (NRSC). We contacted both Congressman Tom DeLay and the House Senate Dinner committee to ensure that Enron could fully participate in The President’s Dinner and receive credit for money we have already committed to give to the Committees earlier this year.’

Pretty amazing. LSI, and its derivations, is at the heart of conceptual search eDiscovery applications. Unlike keyword tools, LSI-based engines convert the textual content of documents into vector mathematics, creating three dimensional models that can be used to identify how the documents relate to each other based on the syntax and frequency of all the words in all the documents, rather than on just shared keywords or vague patterns. The search results depend on how and where ideas and concepts co-join across documents.

So, how does LSI factor into the realities of today’s text-rich litigation environment? Content Analyst is a leading LSI-based tool that has been integrated into Relativity, among other review tools. We use Relativity at Planet Data, and we have also integrated Content Analyst into our early case assessment platform, Exego. Exego is utilized early in the ESI food chain, and is the perfect spot to inject LSI capabilities since this is the juncture in our workflow where a comprehensive pool of document text is first available to end-users. All the original email and edocs have been ingested and deduped, and had their metadata and body text extracted. This process includes files for which no text exists, such as image only PDFs, which we detect, render as TIFFs and OCR to maximize the depth of the searchable text pool. From here, our current best practice unfolds along these lines:

  • We run the agreed upon keywords using our dtSearch integration to identify potentially relevant documents. Our sampling tool then carves out a statistically defensible subset of documents that can be reviewed directly in Exego or pushed to Relativity. Either way, our clients puts these documents in front of a review team, who then separates the chaff from the wheat by flagging “response”,” responsive but potentially privileged”, and “non-responsive documents”.
  • Next, we take the responsive but potentially privilege documents and feed their content into the Content Analyst engine deployed in Exego. This step finds conceptually similar documents within the remaining document population. We then repeat this work-flow for the responsive documents. These two distinct sets are then exported, loaded to Relativity and batched for full-up review. Depending on the matter and the results of the initial sampling, sometimes a second sampling pass is applied in between the initial sampling and the broad export to Relativity for pre-production review.

If this sounds simplistic, that’s because it sort of is. The steps are well defined and require little in the way of execution, with the exception of the actual nose-to-the-grindstone document review step. In terms of the processing, Content Analyst does all the heavy lifting associated with sifting the pool of documents down much more accurately than can be achieved ( or even vaguely approached) by a mere keyword search.

And the results our clients are seeing have been promising. On one recent project, a law firm with a thriving document review practice applied this scenario to a construction matter. The collection was the usual mix of 40 GBs of email and edocs, comprised of just under 148,000 documents that de-duped down to just over 136,000 documents. We created three random samples sets based on keyword hits. Combined, these sets ran to 4,600 documents in total. They were reviewed and the files determined to be responsive were pushed back against the balance of the files containing a keyword hit. The net result was that Content Analyst identified just 5,355 additional documents that were potentially responsive. These formed the primary review set. Following review, slightly less than 1,500 documents of that set were determined to be responsive following review.

Both to validate these results and formulate a supplemental production if warranted, our client then reviewed the balance of the documents that hit on the keyword searches but were not tagged by the concept engine. That population ran to just over 50,000 documents. Following a review of the residual keyword hits, another small subset of approximately 1,500 documents was found to be relevant (out of 50,000 that hit on the keywords alone). It is important to note that the bulk of those documents would likely have been identified by Content Analyst had a follow-on pass been applied.

As I stressed early in this article, it’s all about the accuracy vs. completeness model of the search. Or, more simply put, that point on the curve where the number of documents you reviewed more closely intersects the number of documents that were then determined to be relevant. Is it arguable that the most successful document retrieval-review-production cycle is one where the least amount of time is spent identifying and reviewing the most relevant and potentially responsive documents? Probably not. The alternative is to continue to use basic keyword searching to over-retrieve irrelevant documents and, perhaps worse, overlook what may possibly be the majority of responsive documents.

To end where we started, keep in mind the recurring unmentionable in our little world: Keyword searches are inaccurate and incomplete … but equally so on both sides of a matter. We’re at a point in time where all parties need to move back to the future.

Reflections on the LegalTech Panel


Judicial, Industry, Legal, Media Perspectives on Where Legal Technology is Taking Litigation and How It Affects You

By Howard Reissner

This year at the LegalTech New York conference Planet Data hosted a panel with the Hon. Michael Baylson, U.S. District Judge and an eDiscovery analyst, an attorney and a journalist.

The session filled up early and eager attendees lined the walls to hear about the most hotly debated current issues in e-discovery. The hypothetical scenario allowed the panel and Judge Baylson to explore “attorney –client privilege” and “attorney work product protection”, cost shifting, TAR protocols, vendor selection, and the extent of the role of the judiciary in the discovery aspects of a case.

By design, the hypothetical situation was intended to generate debate between counsel on the appropriateness of their actions during the outset of discovery in a complex case involving multiple parties, numerous potential custodians, and the efficacy and completeness of data collection, processing and searching, review and production.
Over the past year a number of actual cases (including of course, Judge Baylson’s “LA Fitness” decision) have addressed many of these newly emerging issues. Some of the most pressing current concerns have evolved around the efforts to implement TAR on a wider basis. The complexities, strengths and limitations of these technologies have led to procedural challenges to their utilization. The hypo created a situation where the defendants implemented a TAR process and produced far more documents than the plaintiffs. Nonetheless, the plaintiff’s counsel inquired as to how the “seed sets” were developed, and how the methodology for review and production was developed. Along similar lines, the defense counsel demanded to know how the plaintiff identified and collected their documents in light of the relatively small number of documents produced.

“With his decisions in Rhoads Industries and LA Fitness having helped shape the current state of the law of electronic discovery, it was great having Judge Baylson with us live on the panel,” said attorney David Horrigan, e-discovery and information governance analyst at 451 Research, who served as moderator and hypothetical defense counsel on the panel. “Adding David Brown’s perspective from The National Law Journal and Ann Kershaw’s experience as a practicing e-discovery attorney helped us cover all the issues—with Judge Baylson keeping us all in line from the bench.”

As in the real world, these issues were then put before the judge, who was reluctant to be drawn into the underbelly of discovery work-flow and technology. It appeared evident from this exercise that the judge favors litigants resolving these issues between themselves before they reach his courthouse. Our scenario highlighted the real concern that in these very early days of TAR adoption, it is important to slow down a bit and /P>

Areas of High Risk for Counsel when Producing ESI

By Steven Bailey, Senior Case Manager, Planet Data

ESI, electronically stored information, eDiscoveryIn the pre e-discovery age, there was only one aspect to a document production – the Bates stamped paper documents. Today, the majority of all discoverable information is created and stored electronically. Document productions often include a variety of electronically stored information (“ESI”). This piece addresses the areas of high risk for counsel and aims to improve awareness of the issues when a matter involves the production of ESI. Understanding the benefits and challenges of available production formats will allow counsel to create an e-discovery plan to best maximize case strategy and effectively manage costs.

The Federal rules require lawyers on both sides to address all discovery issues at the outset of the litigation. Given the many variables and factors involved in producing ESI, counsel should involve technical team members as early as possible to consider what formats of production are available and what can be generated. It is also important to understand the capacity of the litigation support department or vendor who will process and or produce the data. Knowing “turn-around” times will go a long way to ensure deadlines are met and enough lead time is factored in for proper quality control checks and to accommodate any last minute changes to the production set.

If ESI is requested in a different format than what the party expected, the parties should discuss what is feasible in terms of expense and logistics and who will be bear the extra costs. Where the ESI discovery is unknown or to be produced on a rolling basis, an on-going discussion between the parties will be necessary. Also, logistically speaking, the determination about production formats should be considered up front, before the data is processed as some processing methods may prevent some production outputs. Having to search through boxes of media to re-process certain file types at production time, for example, results in added time and increased costs.

The production of ESI in the format in which it was originally created is referred to as native production. Native format is commonly used for files not meant to be printed such as Powerpoints, spreadsheets, small databases, and audio and movie files. Data contained in these applications works properly when produced natively and it also may be the only way to produce the files for the other side to review. Some attorneys prefer to produce in native format to save the time and expense of converting to static image files like TIFF or PDF. In other cases, tight discovery deadlines leave attorneys no other choice but to produce the documents natively.

While reviewing and producing documents in native format can save time and money, native productions often present case management challenges and risks that may outweigh the benefits. Some key risk factors counsel should consider: Native productions typically do not include metadata or extracted text and as a result cannot be searched or indexed by the receiving party. Additionally, redacting sensitive or privileged information is not possible on native files. The producing party also cannot control or restrict the metadata produced, such as hidden comments, track changes or speaker notes. Lawyers don’t always realize that they are granting full access to all of the document metadata when they produce ESI in native format.

Native format productions can also adversely affect case management and make it difficult to manage evidence during discovery and at trial. Native files cannot be endorsed with a Bates number or confidentiality designation. Consequently, documents used at depositions will not have a shared, page-level Bates number and highly sensitive materials could lack the necessary confidentiality designations. Also, data produced with non-standard or proprietary software may not be able to be opened and viewed at all.

Certain types of files like most e-mail and databases cannot be reviewed or produced in true native format without first being converted. The process of converting ESI to a non-editable digital file is known as rendering. Rendering of the ESI is necessary in order for the parties to redact privileged information. Also, if stamping or designations are required the native files have to first be converted to electronic image format.

Counsel should be aware of the common issues and risks that exist with ESI converted to image format. Most importantly is the risk of altering or losing data during the conversion process. For certain types of ESI the images generated may not accurately represent the native. Excel Files, for example, often contain hidden cells, rows, worksheets, columns, and formulas that are not displayed on the image. Similarly, Word documents often do not display comments and track changes. PowerPoint files generally do not print speaker notes by default and animations do not display properly. For e-mail, blind copies and the date read are not available by default. Embedded data not appearing in TIFF view is likely to be less guarded, and therefore, more revealing and potentially harmful. A sound workflow plan will ensure that these types of ESI are also reviewed in native format to avoid producing embedded data not reviewed.

In determining the form of production, parties should also consider whether they want to request the production of searchable metadata and, if so, what fields. There can be hundreds of metadata fields associated with a single file. The parties should clearly state in writing the metadata requested any known problems or gaps in the metadata received from third parties. Aside from searchable text, metadata should include information about relationships between documents, e.g., parent-child relationships. Most typically, metadata is produced in a standard delimited load file for loading into most litigation support software platforms. Clear and concise communication regarding the load file format will save time and money for each party producing and receiving data. The more common load files include .dii (Summation), lfp (IPRO), and .opt (Concordance/Option)

Given the pros and cons of each production format different forms are often necessary to accommodate different types of ESI. In practice, it is common for a production to involve a combination of images, natives, extracted text, OCR, native files and metadata. Parties often will agree to produce certain ESI in native format along with image files such as TIFFs or .PDFs. Word documents are often produced as TIFF images and Excel and PowerPoints as natives. Files requiring redaction are produced as images while similar non-redacted file types are often produced as natives.

It is important to understand how redacted information is impacted by production format. Special attention should be paid to redacted materials when producing. Redacted images will require extra time to process. The images are OCR’d after the redactions are burned and the re-OCR’d text is substituted for the original text. When producing extracted text and metadata for redacted documents it is necessary to remove the original information from all parts of the production. Quality control will verify that redacted information is properly withheld on the image and from the extracted text and fielded metadata.

A proper document management plan will also assist in mitigating the risks associated with producing ESI. Thorough documentation of the process of review and conversion of the native files to images format should be in place. Documentation defining the review team’s redaction process is also key to ensure that everything produced was properly reviewed. Counsel should also document its privilege searches and verify the accuracy at the beginning and at the end of the production. Attorneys sometimes make coding changes after the documents have been added to the production queue.

Quality control review of the results will also help reduce the potential risks substantially. Each production should be thoroughly checked for quality assurance by the producing party prior to release. The scope and specifications of the production should be reviewed for both technical and legal conformity.

Relativity Analytics – Key Features Used to Improve Review Efficiency and Cut Costs

Denise Atesoglu

By Denise Atesoglu

Analytics is a dynamic tool that can dramatically enhance workflow in Relativity and contribute to substantial time and cost savings. This article aims to outline tactics that save time but do not require significant time or resource investments.

The Analytics platform can greatly improve workflow within Relativity.  It can be used to increase review efficiency, quickly isolate highly responsive or unresponsive documents and prioritize the review of particularly relevant documents.

The underlying technology behind Relativity Analytics is LSI (Latent Semantic Indexing).  This proprietary technology was originally developed for the U.S. Intelligence Community by the Content Analyst Company to offer conceptual analysis and organization for large repositories of unstructured data.  In general terms LSI is a math-based approach to text analytics that uses algorithms to organize text into a three-dimensional vector space.  The proximity of the text in this space is used to identify conceptual relationships among the indexed terms and documents.  It does not rely on external sources to classify the text; instead, it relies solely on the patterns and relationships identified when the data is indexed.

Conceptual Searching (CA Search)

Unlike traditional keyword searching, CA search results will yield conceptually similar documents based on the conceptual correlation of search terms to other indexed terms.  CA search will find documents that would not have otherwise been identified using traditional keyword searching.  Simply put, concept searching can be used to find documents related to a known term or phrase that do not necessarily contain the exact term or phrase.  We have found this type of searching to be a tremendous benefit to our clients, aiding in identifying responsive or privileged documents that would not have been found with keyword searching.

We have used CA search to identify top priority documents to be batched for immediate review.  This is particularly useful when dealing with very large data sets. For example, we recently had a project that consisted of over 11 million records with very tight discovery deadlines.  Traditional linear document review simply was not an option for this team.  With CA search, we were able target the most conceptually relevant documents in the database and create concept-focused priority review batches within several hours of the data being loaded into Relativity.

Finding Similar Documents

The “Find Similar Documents” feature can easily be used on-the-fly in Relativity from both the viewer and text modes.  This feature is used to return conceptually correlated documents based on the full text of an entire document.  It helps users quickly return a set of highly conceptually similar documents to the key responsive and/or non-responsive documents at hand.  We have successfully used this feature to locate groups of non-responsive, potentially privileged and extremely relevant documents, facilitating a more targeted approach to review.

In one of our recent projects, we successfully used the “Find Similar Documents” feature to quickly identify a large number of spam emails prior to batching the documents for review.  This process resulted in our client reviewing 30 percent fewer documents and contributed to great time and cost savings.

Conceptual Near-Duplicate Detection

The ability to quickly identify conceptual near-duplicates is now common practice in Relativity databases when Analytics is enabled. Near-duplicate detection is based on conceptual similarity rather than relying on exact text and metadata matches.  Near-duplicate groupings can be integrated with advanced searching and automated batching in Relativity, as needed.

In practice, we have found that the identification of near-duplicates is particularly useful when MD5 values are not available to identify exact duplicates.  We were able to apply this technology in a recent project on a set of newly loaded third party data.  After identifying the conceptual near-duplicates we found that nearly 40 percent of the records had near-duplicates already coded in the database.  The client was then able to leverage their prior coding to more efficiently code the new data, resulting in improved efficiency and significant cost savings.

Even in cases where MD5 hash duplicates are available, the addition of conceptual near-duplicates can improve review workflow.  Near-duplicates can aid in identifying potentially privileged documents to be flagged for a second-level privileged review.  Additionally, they can be useful when spot-checking coding consistency across documents.


Clustering is a mass operation that automatically groups conceptually correlated documents into virtual folders displayed by topic.  Users are not required to define a set of exemplar documents upfront.  We frequently use clustering in conjunction with batching to generate conceptually similar review batches, aiding in review efficiency.

In a recent project clustering was applied to the full database consisting of around 80,000 records.  It took less than one hour for clustering to complete in Relativity.  The results allowed our client to quickly determine that around 45 percent of the documents were not relevant or eligible for review.  The non-relevant documents were then moved to a secure folder, allowing our client to focus on only the potentially relevant documents. This example clearly demonstrates the vast cost and time saving benefits associated with clustering.

Analytics is a versatile tool that can enhance workflow in Relativity and contribute to substantial time and cost savings.  Furthermore, the features outlined above do not require significant time or resource investments. Our clients have had noted success using Analytics to isolate priority documents for immediate review, locate highly responsive or unresponsive data, and improve overall coding efficiency with the use of clustering and near-duplicate identification.