The Art & Science of Variable Selection in MMM
The process of variable selection in Marketing Mix Modeling or MMM, is far from a mere technical detail. It’s a mix of accurate and actionable marketing intelligence. The core problem stems from the fact that omitting crucial variables or including the wrong ones can introduce significant bias, leading to incorrect causal estimates for marketing channels.
“When marketing decisions are based on these flawed models, the outcomes can be as arbitrary as rolling a dice.”
Different types of Variables in MMM
Broadly, MMM variables are classified into different categories. These encompass everything from media spend and promotions to seasonality and economic factors. By identifying and categorizing these variables, marketers can isolate the true drivers of performance, optimize budgets, and make more informed strategic decisions.
Let’s take a look:
1. Marketing Variables
- Paid Media: These variables are associated with a clear marketing spend, such as TV, digital advertisements etc.
- Organic Media: This category includes marketing activities that do not involve a direct, clear marketing spend, such as newsletters, social media posts or push notifications.
- Non-Media Marketing: This covers a variety of promotional efforts, discounts and the product pricing. Relevant metrics often include time-series pricing data or dummy variables to indicate the presence and specific type of promotions.
2. Non-Marketing or Contextual Variables
- Seasonality & Holidays: These variables capture natural sales patterns influenced by different times of the year, major events, festive seasons or even weather patterns.
- Macroeconomic Factors: Broader economic conditions, such as GDP growth, unemployment rates, and inflation, can significantly affect consumer spending and overall business outcomes.
- Competition: Monitoring competitors actions, including their pricing strategies and promotional activities is crucial, as these directly influence market dynamics and a brand's performance.
- Brand Factors: These encompass long-term drivers such as inherent brand awareness and brand loyalty
“The true challenge in MMM is not simply predicting sales, but understanding them.”
The challenges of Variable Selection
Even with a clear understanding of variable types, practitioners face several significant challenges in the real-world application of variable selection for Marketing Mix Modeling. Below are some of the them
1. Data Quality
The accuracy and reliability of MMM are fundamentally dependent on the quality of its data inputs. Garbage in, Garbage out. Incomplete, noisy, outdated or inconsistently formatted data is dangerous, as poor quality input leads to misleading insights and suboptimal business decisions.
2. Multicollinearity
Multicollinearity arises when two or more independent variables in a regression model exhibit a high degree of correlation with each other. This is a common occurrence in marketing, as different channels often work synergistically or are influenced by the same underlying factors. The presence of multicollinearity makes it exceedingly difficult for the model to isolate the individual contributions of these correlated variables, leading to unstable regression coefficients and unreliable statistical inferences.
3. Overfitting & Underfitting
- Overfitting: This phenomenon occurs when a model learns the training data, including its inherent noise and random fluctuations too much.
-
Underfitting: Conversely, underfitting occurs when the model is too simplistic to capture the underlying relationships and patterns within the data.
4. Data Leakage
Data leakage is an issue where the model inadvertently uses information during training that would not be available at the time of real-world prediction. This leads to artificially inflated performance metrics during testing and validation, but ultimately results in inaccurate and unreliable predictions once deployed in a live environment.
Techniques for Variable Selection
1. Test for Multicollinearity
Multicollinearity can inflate uncertainty in coefficient estimates and reduce interpretability. Tools such as Variance Inflation Factor (VIF) can help identify overlapping variables. Consider aggregating highly correlated channels or using dimensionality reduction techniques like PCA.
2. Apply Domain Knowledge for Pre-Selection
Before running the model, apply business logic and marketing context to add or eliminate irrelevant variables. For example, including weather data to capture the impact on sales for an online plant selling client.
- Identifying Relevant Features: SMEs possess a nuanced understanding of the business and market dynamics, enabling them to pinpoint the most important features and help filter out noisy or irrelevant ones. This directly enhances model performance by ensuring the focus remains on the true key drivers of outcomes. They can identify critical variables that a data scientist, lacking specific industry knowledge, might easily overlook.
-
Validating Assumptions and Inputs: SMEs play a crucial role in validating the underlying assumptions of the model and scrutinizing the quality and representativeness of the data inputs. Their insights confirm if the gathered data accurately reflects real-world conditions and business processes.
3. Machine learning approaches
- Feature Importance: These techniques assign scores to input features based on their estimated contribution to the predictive model's performance, aiding in the identification of the most influential features.
- Penalized Regression (Regularization): These methods introduce a penalty term to the traditional regression equation, which helps in both variable selection and mitigating overfitting
- Model diagnostics check: Favor variables which increases adjusted R² and reduces MAPE
Variable selection in Marketing Mix Modeling is not merely a technical step. It is a strategic imperative that dictates the accuracy, reliability, and ultimate value of marketing insights. The precision with which variables are chosen and managed directly impacts a company's ability to understand the true causal impact of its marketing efforts, optimize budget allocation, and gain a competitive edge.