Case Study on Cost Drivers

In response to my post titled “Custom Data Mining”, some readers asked me if I would share case studies that highlight the effective use of data mining. This post is a case study about using data mining to optimize costs.

Not long ago, a large corporate client of ours asked us to perform data mining on a set of growing data that they managed. This case study discusses the business problem that they faced, the nature of their data, and our solution to their problem.

The Business Problem

The client is the intellectual property ("IP") group within the legal team of a large company. Among its many responsibilities, this group is charged with conducting due diligence on target acquisitions, managing the acquisition agreements, and overseeing all of the other legal actions related to the deals.

However, because the IP group is a relatively small team, and the amount of acquisitions large, they use outside legal counsel to do a lot of the busy work. Over the years, across many deals, the team has used 7 different, outside, law firms.

Each of these firms bills the company by submitting invoices via an online SaaS provider. Our client noticed that over the dozens of deals, total billing by outside counsel, per deal, ranged from a low of $15K to a high of $375K, with most of the billings falling around $150K or lower.

The IP group asked: Why such a huge range on basically similar work?

Company management also wanted to know the answer. Our client needed to justify the differential billings on the various deals. So, using their intimate experience with this work, they prepared a thorough report on what they considered to be the "drivers" of the legal deal costs.

Their report proposed 27 different, possible explanations. Unfortunately, 27 drivers is just not a workable answer. Too many explanations is not that different from no answer at all.

Surely, they thought, there must be a way to get to the truth. So, they brought us in. Our task: apply our data mining expertise to answer the same question, "Why the huge difference in billings?"

The Data

The data comprised Legal Tracker invoices. Each invoice contained one or more "line item" rows about the deal work.

These rows contained different kinds of data related to the legal work. This included date of the work, law firm, biller identity and role, billing rate, hours billed, total billings, and a written description of the work.

There were over 6K of such line item rows in the data, comprising 5.3MB.

Our Solution

Our job was to explain the wide variance in legal billings. We started by defining total deal cost as the "dependent variable", and then tested possible "independent variables" for correlation. In particular, we tested as many of the IP Team's 27 explanations as the data could allow.

In the end, we identified 5 independent variables that correlated strongly with deal cost, and that satisfied standard error tests for linear regression (non-normal distribution of residuals, omitted higher order variables, multicollinearity, heteroscedasticity). And although correlation is not causation, the nature of the data in this case supported stronger conclusions.

We translated these 5 independent variables into plain English as follows:

[Legal fees] appear to be driven up on deals having complications with disclosure schedules and the data room. On such deals, the dog days of “all hands on deck, working around the clock,” increase. Lawyers log heavier line items, and greater communication transpires among and between [the company] and the billers.

We showed that of these 5 explanatory variables, three of them appear within the 27 explanations of the IP Team (bingo!). The fourth was partially discussed in their report. However, the 5th independent variable was completely missed. Moreover, the data either categorically rejected the remaining 23 explanations in the report or was silent on them.

Ultimately, this project showed that, as powerful as human intuition is, it typically cannot identify the key information in a massive trove of data.

By mining their own data, the client knew precisely where to place its focus for managing legal costs on the deals. In particular, they now knew that closer attention was needed on outside counsel when the deals become complex and "hot".

The client learned that mining data effectively can provide real business value.