Minimizing the pain of data collection

Written by Kenneth Wailes | Dec 6, 2023 11:35:53 AM

Data collection is one of the most important preparatory stages in the Marketing Mix Modeling process. Collecting the right types of data comes from carefully planning your objectives then choosing the KPIs most relevant to those objectives. We’ve put some key pointers together to provide a sensible and pain-free data collection process.

First, consider some data types/sets:

Sales (or other KPI) – Value/Volume of sales of the product, subscription, or lead. Volume charts by week focusing on category brands, key brands by variant or key brands by pack size. Any new launches or growth in a particular pack size or variant.
Mass media (TV/Press/Out of Home, Radio) - spend by media and campaign, ratings (GRPs, TRPs), impressions, and competitor activity (when available).
Digital media (search, display, video, social) – impressions, clicks, and spend by channel or campaign, or by strategy (i.e., branded vs. non-branded paid search).
Direct Marketing (mailings, online TM) - Spend, volume and competitor activity.
Promotion – promotional calendars, promotional costs, depth of discount, reach of promotion, competitor activity (when available).
Brand Financials – Revenue per unit of your KPI, Profit per unit of your KPI.
External Market Factors- GDP, weather, key holidays, interest rates, Consumer Price index or Retail Price index, this will vary by industry.
Contextual – Brand and Consumer tracking and research.
Distribution and Price – line charts by week focusing on category brands, key brands by variant and key brands by pack size. Note any changes in distribution or price.

Next, consider which frequency to use for your data analysis

Analysis of data at different frequencies can bring different perspectives, so it’s important to choose the one that works best for you. Think of how you can zoom-in on a map for a more detailed street view or zoom-out for a more global view. Choosing your data frequency has a similar effect.

For modeling we use three types of data frequency:

Monthly data is slower moving and captures “macro” trends. Often brand health measures like brand awareness will only be available monthly.

Daily data is faster and useful for things like monitoring website traffic volumes.

Weekly data is by far the most common frequency level of data used by our clients. It carries the advantage of excluding any “day of the week” effects and often means that weather effects are negated. If looking at daily data, rainfall, etc., can cause very short-term issues in retail sales – and boost online sales.

Third, consider the layout or schema of your data

At some point you’ll need to bring all this data together for modeling. You should first look at and prepare the KPI data you intend to model against (dependent variable) - the goal being to create what is known as a "flat” file or table; this allows for quick checks of your data, either via code (Python, R, etc.) or pivot tables.

Once you’ve decided on the format of your KPI data, you should then align all your other data sources to this same structure. The data should match the same frequency and any other differentiating features such as applicable product or market location/geography.

If data is of a lower frequency (i.e., monthly data in a weekly model), it should be disaggregated to the higher frequency. This can include just using the same values or calculating rolling averages/sums, etc.
If data is of a higher frequency (i.e., daily data in a weekly model), it should be aggregated to the lower frequency. This will often be sums but can also be averages or weighted averages.

By having all data in the same format and aligned, you can quickly compare independent variables against the dependent variable to come up with hypotheses when you build your models.

Lastly, you should think about any alternate aggregations or groupings of data

This is important for both reporting results at a high level and when building an actionable model.

As an example, say you’re building a model that must include insights on most of the social media budget. You have 4 different campaigns, however, you’re only able to get 1 of the campaigns into the model. This won’t work well in providing action items on what to do for social media in the future; but if you create an aggregated or grouped variable that is the combination of all 4 campaigns, you can test this in your model. If it successfully comes in, then it is more useful for planning and decision making than just having that 1 campaign.

By preparing these groupings during data collection, you can quickly pivot to them during modeling if needed rather than having to rethink about the data once modeling has begun.

Data collection can be challenging, but it doesn’t have to be. With proper planning and understanding of the data, you can minimize the challenges of getting the right data for your model and head off any problems that may arise while modeling that stem from data collection.

Learn how to bring your MMM in-house.

View full post