One of the ways I can tell who is an experienced Marketing Mix Model analyst vs a data person who is about to try to build a Marketing Mix Model for the first time is by how much they flinch when the topics of variable selection and model validity come up.
Analysts with more than 2 or 3 model building efforts in their history will immediately flash back to a model (or two, or 50) that just didn’t behave well. Maybe marketing stakeholders were absolutely certain that a particular campaign was a blockbuster, but the model couldn’t find any relationship to sales. Or perhaps there was no model that ever got more than 80% of the marketing budget to be covered by coefficients, but which marketing spend dropped out depended strongly on which base drivers were included. Honestly, there are lots of ways regression models can get squirrelly, and if you build marketing mix models for a living you quickly rack up more regression models ‘in production’ than any other commercial field I know of, so you end up having seen a lot of ‘interesting’ models.
Which is why it can be helpful to the MMM-er to consider what sorts of biases are known and discussed in the applied regression modeling literature. At a minimum, knowing other people have had this problem before can be comforting. On occasion, it can help us build a better model.
This multi-part series of blog posts will illustrate the most commonly discussed biases in regression modeling.
Selection Bias and Simpson’s Paradox
Selection Bias is the bugaboo of many a marketing measurement process; Simpson’s Paradox is a well-known problem in causal inference. For MMM-er purposes, they are related; especially so for digital marketing where the commonly used bidding systems for ad placement are built around delivering ads most likely to be clicked on by the audience.
The data and the analyses
Our illustrative dataset is synthetic data for 4 markets, with different levels of TV advertising in each market. The true effect is the same in all markets (at 2 units of sales per adstocked unit of TV advertising); I have added gaussian noise with a standard deviation of 3.5% of the average sales. The result data is charted in figure 1.
Figure 1
It might be immediately clear to you that the effect of TV does not look positive in this chart. Let’s check how a model would see it:
Figure 2
The regression is estimating an effect of -0.73 units of sales per unit of TV. This just might get our marketing stakeholders fired and lose us our marketing measurement role too! So, as all good researchers have done since the widespread adoption of digital computers made regression models easy to run, let us check our subgroups to see if we can find any sensible effect of TV on sales.
Figure 3
Well, now that’s _much_ better. Recall that our true coefficient is 2. The estimate in market 4 is a bit low. The others are spot on and even in market 4 the estimate is at least the correct sign. No one gets fired for these results!
This sign flip for the pooled analysis vs the sub-group analysis is Simpson’s Paradox.
Which analysis is right?
As an analyst applying advanced statistical methods to provide quantitative insights you undoubtedly feel it’s your job to provide ‘the truth’ to your stakeholders; so, in a real life analysis where the truest truth is unknowable you could be in quite a pickle if your subgroup analysis gave opposite sign effect estimates from your pooled analysis.
The field of causal inference gives us one answer to the ‘which is right?’ question here. And that answer is ‘it depends on the causal model’ (that’s why they call it causal inference, after all). See, e.g., this blog post about Simpson's paradox in CDC data. Another answer to ‘which is right?’ is expert knowledge. We can comfortably say that normal marketing does not reduce sales and use that fact to determine which analysis is ‘more’ correct.
And, of course, we can use both. The assumed causal model will help inform an analyst about which observed (or possibly latent) variables should be accounted for in an analysis while prior information (even if not a true prior distribution of effect like we’d use in a Bayesian analysis) can help with expected effect direction and size.
Would Selection Bias by any other name still ruin my data informed decisions?
Both having a causal model and having clear prior expectations of an effect are useful to an MMM-er, but the causal graph point of view helps us see that Simpson’s Paradox is an extreme and specialized form of Selection Bias.
Figure 4
When we try to measure the impact of a first cause, e.g., TV, on sales and the assignment of that treatment is affected by a second cause of sales, e.g., Market, then any analysis that doesn’t account for that second cause will have a biased estimate of the first cause.
This is pretty intuitive – if you consider the relationship between performance on a standardized test and a prep program for that test that is only open to poor performers, we might still expect the post-prep program scores to be lower than average. And so, if we try to use the average of prep program attendees’ scores vs non-attendees’ scores to judge the program we might see a negative effect, yes?
That’s Selection Bias – and, when the directions of the effects on treatment assignment (e.g. being sent to the prep program, or running TV ads) are opposite, the effects on outcome (e.g. poor past performance on the test makes future performance worse, but makes attendance in the prep program more likely), then that Selection Bias can generate Simpson’s Paradox.
All marketing has many variables that affect both ad exposure (our treatment) and conversion (our outcome). In modern digital marketing, ad exposure is predicated on winning bids and the bidding process explicitly aims to target likely converters. That’s great for performance marketers chasing cost per acquired customer goals, but a more perfect process for creating Selection Bias could not have been designed. As such, Selection Bias is possibly the single largest source of error in marketing measurement; Simpson’s Paradox serves as a nice, eye grabbing example of extreme Selection Bias.
That feeling when you’ve been running MMM for years and haven’t heard of Selection Bias or Simpson’s Paradox
Ok! Maybe not you, dear reader, but me. I had been running MMM for years before I heard of Selection Bias – and it was another 10 before I read about Simpson’s Paradox (in a review of the Book of Why, since I know you were wondering).
Did that mean all of my models were misguided and wrong?
I came to MMM by way of a Mathematics education with a smattering of design of experiments statistics. Experimental design exists to eliminate systematic Selection Bias, but the phrase wasn’t used in the graduate Statistics classes I took. In the Mathematics world we treat regression as a linear algebra or optimization problem, with interpretation and hypothesis tests bolted on in the next department. Even my Economics classes had missed this, because causal inference was not the hottest topic in Econometrics until long after I was out of school.
But fret not! I was aware we needed more variables rather than fewer in our models; not because of Selection Bias, but because of Omitted Variables Bias (OVB).
Omitted Variables Bias (OVB)
Omitted Variables Bias (OVB) occurs when two independent variables are correlated to each other and a regression model’s dependent variable. We can re-purpose our data example above to become an Omitted Variable Bias example if we just replace the by market analysis with a regression model that includes market as an independent variable. In real data, we would have market as categorical (i.e. non-ordinal and certainly not as a ratio scale) data and use dummy variables to estimate an effect for each market. In this example, I numbered the markets 1-4 and made the market effect multiple of the market number, so market uses a continuous numeric variable. Since we’ve already charted the data, I will just show the table of coefficient statistics for the reduced and full models.
Figure 5
We see that by itself, TV nets us an effect of -0.73. When we include the Market variable, TV ‘s effect becomes remarkably close to the true value of 2.0.
Note that both estimates are highly statistically significant per the usual thresholds, and also that the effect sizes of both are large enough to be of practical significance (a coefficient of -.73 would have TV reducing sales by 33% in most weeks in the data, a coefficient of 1.9 has TV doubling sales in most weeks). There really is no way to know from the statistics alone which of these models is better, but we can check the Pearson correlation between TV and Market and see that it is fairly high (-0.66) and since there is a formula for OVB in OLS regression models, we _could_ have predicted the difference in TV coefficient just from the Full Model alone:
Figure 6
In the modern world it’s easier to run a reduced model than it is compute OVB by a formula, but the fact that this formula exists points to just how well studied a phenomenon OVB is.
And that is why, even though I wasn’t exposed to Selection Bias when I was first making models for a living, I was often already adjusting for it as best as our datasets allowed. Let’s look back at our Selection Bias description:
When we try to measure the impact of a first cause, e.g., TV, on sales and the assignment of that treatment is affected by a second cause of sales, e.g., Market, then any analysis that does not account for that second cause will have a biased estimate of the first cause.
Causes in a causal model become independent variables in the regression model; outcomes in a causal model are dependent variables in a regression model. Correlation will exist when the relationship between cause and effect is approximately linear. Yep, for a modeler using regression that is focused on good effect estimation, Selection Bias and OVB are just two separate ways to write the same thing.
So what?
The first takeaway for an MMM-er is that good variable selection does often require including variables that are correlated with each other, no matter what the Stats 101 textbook says about it. The second takeaway is that Simpson’s Paradox is a fun trick, but Selection Bias is definitely something to worry about. The third takeaway is that if you are using a regression model, Selection Bias will show up as Omitted Variables Bias.
There is also, maybe, a higher-level point to know here – these biases exist in ‘statistically valid’ models. It truly does take domain knowledge to build a model that accounts for Selection Bias (whether or not that’s done with a causal DAG), and it truly does take MMM specific experience to build the best possible model for marketing measurement and decision making. Perhaps, dear reader, you are asking “Why is that? It sounds like we can just put all the variables in to avoid OVB, which will handle our Selection Bias, and we’ll be done.” In which case, you really need to read the next blog post in this series.
Some further reading