“There is a lot of art in Marketing Mix Modeling” say some vendors. Others say: “That phrase is used to cover up the weaknesses of the approach, which if correctly applied is rigorous science!”
I say: “It’s complicated, and the issues are not solvable by any software package or method because there are a series of tradeoffs to be carefully considered – so keep reading my blog articles!”
Previous articles in this series covered Omitted Variable Bias, Selection Bias, Simpson’s paradox, and Suppression Bias. The avid reader will recall that from the perspective of an analyst fitting an MMM, those are two kinds of biases – the ‘part 1’ biases are caused by omitting variables from the model specification and the ‘part 2’ biases are caused by adding variables into the model specification.
In the current comments, we will consider biases caused by another choice made in model building before variables are specified:
Aggregation Bias
Aggregation bias is the name given to a systematic error in effects estimated on data summarized to a grosser grain than the actual occurrence of the causal effect.
That’s quite a mouthful, so let’s step back and discuss data granularity with examples drawn from marketing and sales data.
How fine is your data?
‘Granularity’ is jargon for ‘how big are your chunks?’
This is not an attempt to sell you nutritional advice and a workout plan.
Consider a 10k filing required by the S.E.C for publicly traded companies in the USA. In it you will find annual revenues or net sales for the company that filed it (among other things). If you pulled the last 10 years of 10k filings for Apple Inc. and extracted the net sales number, you would have annual revenue numbers.
But companies also file a 10q, the quarter report for investors. If you pulled the last several 10q filings for Apple Inc., you’d have quarterly net sales numbers. The time grain of the data in the 10k filings is annual; the time grain of the 10q filings is quarterly.
Now I know what you’re thinking: that in time series analysis we call that the ‘frequency’ of the time series and I’m just making up new words for the same idea. Oh, and you’re also wondering whether you should skip to the next LinkedIn post instead of reading the rest of this . . .
Yes, for time series the frequency is the same as the granularity of the series.
But go back to the Apple 10k. For specificity, let’s look at this one. If you look at the Consolidated Statements of Operations (near the bottom) you can see Net Sales shown for the previous 3 years. The time grain is annual (it’s an annual report!), but there is a product vs services split. We might call this the grain of the product dimension (Total, Goods vs Services). But within the same file, nearer the top, you can find a table entitled “Segment Operating Performance” that splits the total net sales into different regions; we might say the geographical grain of that table is regional, whereas the Consolidated Statements of Operations have a global geographical grain.
So, granularity is a general concept whereas time series frequency is specific to the time dimension. And that’s how jargon proliferates. Granularity is a very useful concept because it helps us clarify how data is stored and reported and emphasizes hierarchical dimensions – e.g. day is within week, within month, within quarter, within year, or country within region, within planet.
How does this classify as a Bias?
Please imagine we’re building a regression model of Apple’s Net Sales. We know from our explorations above that this data is available at a global-all product -annual, global- all product -quarterly, regional- all product - annual, regional – all product- quarterly, and global – per product – annual, global-per product-quarterly.
Which granularity of data should we use to build my regression model? You can Google it; I’ll wait.
…
…
Back so soon?
I bet that was an unsatisfying experience. You might have found some results about “aligning the model with the requested analysis” (translation: if you are asked for quarterly impact, estimate on quarterly data) and you might have found “different aggregations might reveal different relationships, so try several” (translation: if you need a p-value of 0.05 to publish, don’t forget to try different data granularities).
Maybe you have a lot of causal inference in your search history and you found a link discussing how the causal DAG will inform the level of analysis (because Simpson’s Paradox is so intriguing we can talk about it in this context too!).
My answer (which means it’s the best answer, of course) is that ideally a model will be estimated on data at or below the level of exposure to the effects of interest when those data are available.
But basically, the answer is that there is no pre-defined answer. As far as the math of regression is concerned, you can estimate a model on quarterly data, or annual data. The math maths just mathily.
But if you do happen to estimate a model on quarterly and on annual data, you might get very different results – one might show a statistically significant effect (for your personal cutoff of significance – I like p<=0.42) and the other not. One model might have a positive effect and the other a negative effect (!)
And that difference is called Aggregation Bias.
An example using Price Elasticity
To illustrate this, I’ve created 365 days of unit sales and prices. For this product, the price is 88 monies. Except on Wednesday and Thursday, when it is 80 monies. The product is a normal good, economically speaking, and has a perfectly constant price elasticity of -2. There is some noise in the buying, of course, caused by the usual assortment of unobserved human things (traffic lights, obstinate adolescents, the urge to sneeze . . .)
Table 1 shows two weeks of data:
Day | Date | Week | Price | Unit Sales | Revenue |
Wed | 2025-01-01 | 2025-01-04 | 80 | 5 | 400 |
Thu | 2025-01-02 | 2025-01-04 | 80 | 4 | 320 |
Fri | 2025-01-03 | 2025-01-04 | 88 | 4 | 352 |
Sat | 2025-01-04 | 2025-01-04 | 88 | 4 | 352 |
Sun | 2025-01-05 | 2025-01-11 | 88 | 4 | 352 |
Mon | 2025-01-06 | 2025-01-11 | 88 | 4 | 352 |
Tue | 2025-01-07 | 2025-01-11 | 88 | 4 | 352 |
Wed | 2025-01-08 | 2025-01-11 | 80 | 5 | 400 |
Thu | 2025-01-09 | 2025-01-11 | 80 | 4 | 320 |
Fri | 2025-01-10 | 2025-01-11 | 88 | 4 | 352 |
Sat | 2025-01-11 | 2025-01-11 | 88 | 4 | 352 |
Sun | 2025-01-12 | 2025-01-18 | 88 | 4 | 352 |
Mon | 2025-01-13 | 2025-01-18 | 88 | 4 | 352 |
Tue | 2025-01-14 | 2025-01-18 | 88 | 4 | 352 |
Table 1
Since we have built this to have constant elasticity, we fit this as a log(Unit Sales) ~ log(Price), and expect to recover a -2 as the slope. Table 2 shows the coefficients and p-values from a quick lm() call in R:
Term | Estimate | Std.error | Statistic | P.value |
(Intercept) | 9,825074733 | 0,298723261 | 32,89022316 | 0,000 |
log(Price) | -1,88551814 | 0,067126892 | -28,08886382 | 0,000 |
Table 2
And here is a chart of that (built in Excel). Note that there are 365 points in this chart!
This is a strong result that almost perfectly recovers the true elasticity of -2; the p-value is stated in terms of E-111 – that should match anyone’s idea of significance!
Table 3 shows the head of the weekly data, matching the 14 daily records shown above.
Week | Unit Sales | Revenue | Price |
2025-01-04 | 28 | 2.480 | 88,57 |
2025-01-11 | 29 | 2.480 | 85,52 |
Table 3
And, of course, we have the parameter table from a model on this data:
Term | Estimate | Std.error | Statistic | P.value |
(Intercept) | 193,9256468 | 23,22853286 | 8,348596444 | 0,000 |
log(Sum.of.AvgPrice) | -42,8446055 | 5,222820031 | -8,203347091 | 0,000 |
Table 4
With the plot of all 52 weeks:
So maybe we expect the elasticity to be a lot higher because the units are up by a factor of 7-ish (7 days in a week, right?) . . . except, of course, that that logic applies to the intercept, but elasticity is a % change / % change so the magnitude of the units sold should be accounted for.
For Intuition, We Plot
The charts really tell the story. In this (synthetic) data, random noise introduced by rounding to an integer after creating the ‘true value’ in log space is small relative to the price change effect.
When we aggregate the daily sales to weekly however, the price blurs into 6 different values as the random noise on unit sales creates different weekly average prices from our two daily prices. And, with only two days a week at the discounted price but 7 days a week of random noise, if any given week happens to have more noise in the same direction as the price effect, the noise can accelerate the relationship (as seen here). The opposite could, of course, be true, and random chance could assign noise in such a way as to counterbalance the price effect.
In a study we could ‘increase sample size’ to ameliorate the risk of random noise causing poor effect estimates. But there only are 7 days in a week . . .
Walking us back to Marketing Mix Modeling
Because syndicated data was historically bought and sold at the weekly time grain in the USA and at monthly time grain in Europe, MMM analysts often default to one of those time grain choices and most analyst granularity choices are made on the product and geography dimensions. But what applies to price and daily data in the example above applies to Meta impressions over time, and to Meta impressions across Cities.
How can we justify this? Shouldn’t we be using the data at the grain of the data generating process?
Which is . . .the individual buyer level. So, yes, if you have access to a complete dataset of hourly marketing exposure and the opportunity to purchase a substantial sample of shoppers for a product, you should definitely run your models on it. And tell me all about it – I've never had such a dataset, and I strongly suspect you won’t have one either (as long as the Truman Show remains a movie and not reality!).
And so, the first practical truth about aggregation bias is that every MMM will have some, because the data isn’t available to avoid it.
The second practical truth about aggregation bias in MMM is that if no one can get the individual data to estimate an MMM on, then we will never be able to quantify the resulting amount of bias due to aggregation.
So, dear reader, does this mean all hope is lost? Of course not. Primarily because I selected an alarming example, with an extreme amount of bias; partly because many marketing tactics are always applied in aggregate and can only be measured in aggregate (for now – maybe someday our AI smart clothes will capture all ads we are exposed to and all purchase opportunities we come across and then send that information to data scientists for marketing measurement purposes ... or maybe not).
And mostly because aggregation bias occurs when ‘measurement units’ with heterogenous exposure to a treatment are combined in an analysis. In this example, analysis on the daily data works well but on the weekly data fails, because some days have a discount and others don’t. In a Marketing Mix Model, we can try to keep that in mind, and to find data at a granularity that doesn’t regress a KPI on weighted averages of exposures where the weight is the KPI itself.
To wrap up
I find aggregation bias interesting primarily because it reminds us that ALL choices we make in building a Marketing Mix Model can have substantial impact on the measurement. It is very very common in MMM to run ‘national-weekly’ or ‘national-monthly’ models, because weekly or monthly is the time grain of syndicated data, and “our media is bought nationally anyway, so why bring in all the noise of lower-level data?”
And, of course, that might be exactly the right way to do an MMM for an e-commerce brand that has no substantial promotional strategy and runs marketing at single target audience (or, perhaps more commonly, at a bunch of target audiences that are indistinguishable from each other at the time of purchase).
But for products sold in retail outlets, with merchandising and the need to stop at the store for other products being key volume drivers, maybe moving away from those choices is a good idea. But it does have a cost! And we need to consider that as well.