Traditional eDiscovery Workflows Can Be an Information Governance Nightmare

Written by Doug Austin, Editor of eDiscovery Today

As I mentioned last week, ILTACON 2021 returned this week as a hybrid conference this year after being conducted completely virtually last year and the in-person component has been conducted in Las Vegas. I have attended in-person and covered the event as press, so I was able to attend several of the sessions, which are always high quality, and this year was no exception.

One of those sessions was titled Discovery, Information Governance and Retention: How Long Should This Go On?, and the panelists (Richard Brooman of Saul Ewing Arnstein & Lehr, Amanda Cook of Acorn Legal Solutions and Stephen Dempsey of The Chemours Company) did a great job discussing the handling of data during the discovery and review process, and what organizations need to retain from these processes during the matter lifecycle.

One of the considerations that they mentioned is one that I don’t think is discussed enough within our industry: how much data is generated during traditional eDiscovery workflows. The amount of data (and especially redundant data from all of the copies of ESI generated during discovery) can be an information governance nightmare!

How ESI Flows Through the EDRM Phases

Traditionally, a lot of data flows through the EDRM lifecycle and the original model was designed for that, which is why the bottom of the model reflects a high volume at the left side of the EDRM model, which slowly descends as you move right and the percentage of relevant ESI rises. But a lot of people in our industry don’t really think about how much data is created during the EDRM lifecycle.

Here is a breakdown, phase by phase:

  • Identification/Preservation: Data generated here can include custodian questionnaires and interview notes, metadata needed to track preserved data, legal hold notices sent and responses received, tracking of hold status, etc.
  • Collection: Any data being collected is copied in a traditional eDiscovery workflow, so if you are collecting entire custodian data stores, you are duplicating all that data for further action downstream. In addition, you have chain of custody logs associated with collection.
  • Processing: Processing of the collected data typically yields searchable text files, metadata and possibly images of the processed files (depending on whether you image all files during processing or do so for a subset of the data downstream). And it can also include additional near-native forms of any data stored within container files (e.g., Outlook PST files, ZIP files, etc.). You will also likely have logs for tracking processing jobs, etc.
  • Review/Analysis: During review, a lot of user metadata is generated about the documents like responsiveness and privilege determinations, issue coding, redaction information, etc. Sometimes, this is when images of selected documents are generated “on the fly” to support the need to redact. And a lot of metadata is generated in analysis of the document set, from early case assessment to business intelligence and project tracking. Conceptual clustering, email threading and predictive coding all also generate considerable metadata about the document set.
  • Production: The documents that are responsive and not privileged are then produced – creating yet another copy of them – and the form of production can include any or all of the following: native files, metadata, text files and image files. You’re also generating production and privilege logs that typically accompany the production.
  • Presentation: Presentation can include yet another copy of ESI being used as evidence in hearings, depos and trial and you may have additional data loaded into a trial database, which can include exhibit tracking, etc.


When you’ve completed the entire EDRM discovery lifecycle, you may have generated as many as six to seven additional copies of ESI for at least a portion of the document collection, plus considerable other data along the way to manage, track and report on the project.

This is another reason why so many organizations are putting more emphasis upstream to the left of the EDRM model and the information governance phase that starts it all. The more efficient you can be at the beginning of the EDRM lifecycle, the better you will be able to minimize the information governance nightmare that traditional eDiscovery workflows can create. Sleep well!

Learn more about how IPRO solutions allow more efficiency in Information Governance in the left side of the EDRM model.

And for more educational topics from me related to eDiscovery, cybersecurity and data privacy, feel free to follow my blog, eDiscovery Today!