A New Gold Standard for Digital Ad Measurement?

Table of Contents

Ever since Neil Borden coined the term “marketing mix” in 1949, companies have searched for ways to analyze and refine how they market and promote their products. For a long time, the leading analytic approach to this problem was “marketing mix modeling,” which uses aggregate sales and marketing data to suggest strategic adjustments to a firm’s marketing efforts. But in the realm of digital ad measurement, this approach was largely taken for an outdated behemoth, easily outmaneuvered by the immediate, precise, and deterministic attribution new technology enabled.

Now, however, marketing mix modeling is making a comeback.

Why? For one, fundamental changes to the digital ads ecosystem — such as Apple’s new limits on what advertisers are able to track — mean that deterministic user-level measurement of digital advertising effects is only going to get more challenging. As this data dries up, companies that don’t adapt run the risk of suddenly finding themselves in the dark. In this new landscape marketing mix models (MMMs) have a specific advantage: They’re able to produce dependable measurements — and insight — purely from natural variation in aggregate data, and don’t require user-level data.

Making MMMs part of your marketing analytics toolkit isn’t as easy as flipping a switch, however. Under the wrong conditions and without careful guidance they can be imprecise and can misinform a company’s marketing decisions.

Companies that want to start — or restart — using MMMs need to use ad experiments to dial in their digital marketing approach. A set of field studies that we conducted with digital advertisers suggests that the process of using experiments to calibrate models is needed to alleviate potential imprecisions in MMM’s estimates. In this article, we dive into why you should, and how you can, do just that — and thrive in the new digital ad measurement landscape.

Why Experiments Are Important

MMMs are great because they work with aggregate data. But they can struggle when your ad strategies and related attentional and competitive dynamics vary a lot across ad channels. Highly personalized ad campaigns, as are often used on digital channels, can make this latter point particularly salient. There’s a way to account for this, however: by refining your MMM through experimental calibration, guided by a well-understood measurement plan, you can feel more confident in the information it’s giving you.

How do we know this? Over the last two years, we conducted 18 case studies with app advertisers in North America and Europe, comparing MMM-based with experiment-based measurements. We found a few important insights.

First, calibration via ad experiments pays off. In our case studies, calibration on average corrected MMM-based return-on-ad-spend estimates by 15%. Other reports have found an average calibration correction of 25% across a multitude of verticals, including fast-moving consumer goods, home appliances, telecommunications, real estate, and automotive, and across a multitude of regions, including APAC, the U.S., Brazil, Russia, and South Africa.

Second, more narrow targeted digital ads appear to require more calibration. Custom audience ads in the U.S. required the highest overall calibration adjustment of 56%. This suggests that companies that rely on just a few channels and smaller brands with niche market segments may want to run experiments to refine their models more frequently.

Ad Experiments You Can Expect to Run in the Future

Precise user-level ad experiments are coming under siege the same way that user-level ad measurement is. As the ability to deterministically observe user behavior across websites and apps decreases, ad experiments will either need to focus on on-site outcomes (such as views, clicks, and other on-site metrics), rely on differential privacy to match off-site outcomes with on-site behavior, or make use of so-called clustered randomization. With clustered randomization, assignment of the experimental ads is no longer controlled at the user level, but at less granular scales, such as geographic regions.

For example, with geo ad experiments, consumers in certain ZIP codes, designated market areas, states, or even countries will see experimental ad campaigns, and consumers in others will not. Differences in sales and brand recognition between exposed and non-exposed geo units are used to measure the incremental impact of the experimental ads. Geo ad experiments can provide a ground truth to calibrate the MMM against. This approach is offered in Google’s and Meta’s measurement suites, has long been used in TV advertising, and has been adopted by leading digital advertisers such as Asos.

Other avenues for ad experimentation in a more data-constrained digital advertising environment may come via technologies such as differential privacy. Differential privacy allows for matching of information between different datasets (observed on different apps and websites) without revealing information about individuals. Randomization induced on one app/website (in one dataset) could then be matched to outcomes such as purchases observed on another app/website (in another dataset).

Calibrating an MMM

So how can you use ad experiments to calibrate your MMM? We would like to highlight three ways for calibration that differ in rigor and ease of implementation:

  1. Compare the results of MMM and ad experiments to ensure that they are “similar.” This approach is qualitative and easy to implement. Similar can mean that, at a minimum, both approaches pick the same winning ad variant/strategy or that the two directionally agree. Should results be dissimilar, tweak and tune the MMM until agreement is achieved.
  2. Use experiment results to choose between models. As a more rigorous extension to the qualitative approach, the marketing analytics team can build an ensemble of different models, then decision-makers can pick the one that agrees most closely with the ad experiment results for the key outcome of interest (e.g., cost per incremental conversion).
  3. Incorporate experiment results into the MMM. Here, the experiment results are used directly in the estimation of the MMM and not just to compare with the MMM output (#1 above) or to help with model selection (#2 above). Doing so requires a deeper understanding of statistical modeling. The experiment results can either enter your MMM as a prior (e.g., if you use a Bayesian model), or they can be used to impose a permissible range on the model’s coefficients. For example, say your ad experiment on a specific channel shows a 150% return-on-ad-spend with a 120% lower and 180% upper confidence bound; you can “force” your MMM coefficient estimate for that channel to be within that range.

The third approach is the most rigorous, but it’s also the most difficult strategy to implement. If you choose to adopt it, we recommend doing so in conjunction with the second approach. In other words, 1) identify a set of candidate models that produce reasonable estimates vis-à-vis the experiment output; 2) incorporate the experiment results in MMM estimation; and 3) pick the model that produces the most balanced results against other experiment results and expert assessments.

When calibrating your MMM, also be mindful that MMM and experiment runs can be different in scope — for instance, all advertising vs. online only — and that there can be interaction effects — for instance, between online and offline ads and sales and vice versa. Also, be aware of dynamic effects such as ad stock. (Explaining all aspects of quantitative MMM calibration in detail is beyond the scope of this article, but interested readers can find excellent and detailed case studies here, here, and here.)

How Frequently Should You Calibrate?

This is an important, but tricky and multifaceted question. Advertisers who deeply embrace incrementality measurement may choose an “always-on” solution where advertising is consistently experimentally validated. This approach can work well for large international companies that can afford to “go dark” in select geographies at any given time. Based on what we’ve seen over the last years working with digital advertisers, we’ve tried to put together a rough-and-simple matrix to inform decisions on calibration frequency.

The table aims to provide a rough guide to marketers new to experimental calibration of MMMs and MMM-based incrementality measurement — take it with a grain of salt. In our experience, and based on the case studies we’ve run, the more targeted your ads and the more niche your ad strategy, the more you want to make sure to experimentally calibrate the MMM supporting your marketing decisions on a channel. Further, the more you spend on a channel, the more money you put at risk, and hence, for channels with higher ad spend, you will want to make sure to calibrate your MMM more frequently.

Companies should scrutinize, adapt, and enrich this guidance based on their institutional knowledge and ongoing operational insights and priorities. In any case, it can make sense to run experiments during “less-important” times (so, not during peak sales seasons, new product launches, or big external events such as the Superbowl) and in locations that are less central to a brand’s advertising strategy.

• • •

As privacy advances fundamentally change the digital ad measurement landscape, we recommend embracing MMM as a key part of the marketing analytics toolbox. There are good vendors selling more or less plug-and-play solutions out there. Additionally, if you don’t harbor pre-existing internal MMM expertise, an experienced consultant can be helpful to successfully integrate with a vendor and set up an internal baseline model. Especially if you rely heavily on online advertising, regularly calibrate your MMM using ad experiments to make sure your measurements are accurate and your digital marketing decisions are well-informed.

The combination of MMM and experimental calibration as described above may well become a “new gold standard” for ad measurement in data-constrained online environments. At a minimum, it provides reliable and effective measurement until nascent technologies such as differential privacy and interoperable private attribution gain a true foothold in digital ad measurement.

Leave a Reply