A few weeks back, this paper got some attention in the marketing measurement corners of the interwebz.

I think it’s an excellent read for analysts creating marketing mix models because:

1. The introductory and literature review sections are solid reviews of the process (I can’t remember so many citations of previous MMM specifications in one place!)*and*

2. The authors very precisely examine a specific example of MMM’s biggest tension, which is that complexity might better capture reality but it ruins our ability to estimate models.

The central point of the paper is that with real life marketing datasets used in MMM and with lag related effects present in a model (such as adstock), it is impossible to simultaneously identify saturation effects and time varying effectiveness. What makes the paper very convincing on this point, other than the mathematical reasoning is section 3, is showing that a time varying parameter model can accurately predict on synthetic data with only saturation effects and also that a model with non-linear saturation effects can accurately predict (sometimes) synthetic data with time varying effects. If a model with either kind of effect can fit a data generating process with the other kind of effect, then there is no way to use model fit on the training data to determine which kind of effects a model needs to have!

All I want to do with this quick blog post is walk through how I think about this ability of time varying effects to stand in for saturation behavior. Let’s start by revisiting some concepts central to this point.

First, we have adstock. Introduced by Simon Broadbent in the early 70s, adstock is now a general term for a time-series transformation of marketing effort that means the current time has been influenced by past marketing efforts – it’s sorta like the reverse of Wimpy and the hamburgers: the advertiser pays now, but the effect comes later.

The chart here shows how a single spike of advertising effort gets spread ‘into the future’ by a common adstock formulation:

The next component to understand is saturation behavior. In general, there is an expectation that each increment of advertising effort will have a smaller impact on a business outcome than the previous increment. I think of it this way: advertising on TV works on people that see my ad. If I spend twice as much on TV ads, fewer than twice as many people will see my ad and certainly if I spend 100,000,000 times as much on TV, I won’t get 100,000,000 times as many people to see my ad because there aren’t that many people. So, at some point, that TV ad buying shows diminishing marginal returns. In MMM we use an assumed functional form whose ‘amount’ of diminishment is parameterized, so we can estimate the saturation behavior of each driver. The AdBudg (a.k.a Hill) function is typical:

Now, the key issue is that MMM practitioners almost always compose these two effects – that is, we assume there is both a lagged impact of marketing represented by the adstock transformation AND we have diminishing returns of that adstock represented by (e.g.) the AdBudg function. So, when it comes time to run a regression to estimate the effect of a marketing driver on an outcome, we end up selecting parameters for adstock and parameters for AdBudg and creating a transformed marketing effort, like the green line in the next two charts.

This chart shows what a single spike of activity looks like when transformed:

Now, let’s assume a coefficient of 1 in our regression model – so each increase in the green line (the final modeled variable in the regression) is worth 1 unit of outcome (e.g. units sold). What is our weekly effectiveness?

I guess we better define effectiveness! Effectiveness = outcome / unit of ad effort, and just reading the numbers off the chart above we can get a table like this:

Would you look at that! The effectiveness *varies over time!*

Ok, so what?

Well, this is exactly why a model that is trying to estimate time varying coefficients and not saturation effects can do a nearly perfect job predicting a data generating process with saturation behavior. Instead of estimating a falloff in effect due to the adstock level, the time varying parameters model will estimate a falloff in effect as a change in effectiveness over time.

And this is where the authors see a difficulty for marketing mix models. I think most marketing folks would agree that marketing effectiveness can vary over time. I think most marketing folks agree that we expect diminishing marginal returns from marketing (because it is possible to reach the entire addressable market). So, obviously, a model that includes both effects is the best model, right? The paper is much more thorough than my quick example above in showing that because including either feature in the model can capture the dynamics of the other, we end up with a poorly specified model.

And not only in a theoretical sense, but also practically speaking. Whenever a parameter estimation process has two different ways it can accomplish the exact same prediction, the parameters become highly unstable, as small differences in the data can shift the best fitting parameters from the first way to the second way. This is well known to many practitioners from the effects of near-collinear marketing variables being included in a model – if the 2*TV + 3*Radio = 3*Radio + 2*TV, then variance of the estimated coefficients will explode.

This analogy to collinearity also points a way to a possible solution if an analyst is convinced they must have both saturating behavior and time varying coefficients – some kind of regularization is required. That regularization could be from penalties on the size of the time varying steps or it could be from using Bayesian methods to estimate the model and specifying informative priors.

ScanmarQED has a different solution for this, more or less by accident. As part of our ongoing quest to make marketing mix modeling more scalable without sacrificing transparency, we’ve developed a two-stage approach to modeling. In the first stage, we estimate a model that uses average effectiveness over a two+ year history which identifies saturation behavior. In the second stage, as more data is collected, we update coefficients with sequential learning as each time period is added. Those updating coefficients become time varying parameters that are estimated while holding the adstock and saturation parameters fixed.

Much like regularization, our update approach cannot completely remove the difficulty of identifying both kinds of effects, but by accepting a time-averaged effectiveness in the main model estimation stage any time varying effectiveness which occurs during the update periods is cleanly estimated if we believe the lag and saturation from the first stage are correct (and constant). Which is why I’m comfortable saying “Our MMM is NOT broken!”