
This guest post is written by Dr. Julian Runge, an Assistant Professor in Integrated Marketing Communications at Northwestern University, and William Grosso, the CEO of Game Data Pros.
Observational Causal Inference (OCI) seeks to identify causal relationships from observational data, when no experimental variation and randomization are present. OCI is used in digital product and marketing analytics to deduce the impact of different strategies on outcomes like sales, customer engagement, and product adoption. OCI commonly models the relationship between variables observed in real-world data.
In marketing, one of the most common applications of OCI is in Media and Marketing Mix Modeling (m/MMM). m/MMM leverages historical sales and marketing data to estimate the effect of various actions across the marketing mix, such as TV, digital ads, promotions, pricing, or product changes, on business outcomes. Hypothetically, m/MMM enables companies to allocate budgets, optimize campaigns, and predict future marketing and product performance. m/MMM typically uses regression-based models to estimate these impacts, assuming that other relevant factors are either controlled for or can be accounted for through statistical methods.
However, MMM and similar observational approaches often fall into the trap of correlating inputs and outputs without guaranteeing that the relationship is truly causal. For instance, if advertising spend spikes during a particular holiday season and sales also rise, an MMM might attribute this increase to advertising, even if it was primarily driven by seasonality or other external factors.

Observational Causal Inference Regularly Fails to Identify True Effects
Despite its widespread use, a growing body of evidence indicates that OCI techniques often stray from correctly identifying true causal effects. This is a critical issue because incorrect inferences can lead to misguided business decisions, resulting in financial losses, inefficient marketing strategies, or misaligned product development efforts.
Gordon et al. (2019) provide a comprehensive critique of marketing measurement models in digital advertising. They highlight that most OCI models are vulnerable to endogeneity (where causality flows in both directions between variables) and omitted variable bias (where missing variables distort the estimated effect of a treatment). These issues are not just theoretical: the study finds that models frequently misattribute causality, leading to incorrect conclusions about the effectiveness of marketing interventions, highlighting a need to run experiments instead.
A more recent study by Gordon, Moakler, and Zettelmeyer (2023) goes a step further, demonstrating that even sophisticated causal inference methods often fail to replicate true treatment effects when compared to results from randomized controlled trials. Their findings call into question the validity of many commonly used business analytics techniques. These methods, despite their complexity, often yield biased estimates when the assumptions underpinning them (e.g., no unobserved confounders) are violated—a common occurrence in business settings.
Beyond the context of digital advertising, a recent working paper by Bray, Sanders and Stamatopoulos (2024) notes that “observational price variation […] cannot reproduce experimental price elasticities.” To contextualize the severity of this problem, consider the context of clinical trials in medicine. When a new drug is tested, RCTs are the gold standard because they eliminate bias and confounding, ensuring that any observed effect is truly caused by the treatment. No one would trust observational data alone to conclude that a new medication is safe and effective. So why should businesses trust OCI techniques when millions of dollars are at stake in digital marketing or product design?
Indeed, OCI approaches in business often rely on assumptions that are easily violated. For instance, when modeling the effect of a price change on sales, an analyst must assume that no unobserved factors are influencing both the price and sales simultaneously. If a competitor launches a similar product during a promotion period, failing to account for this will likely lead to overestimating the promotion’s effectiveness. Such flawed insights can prompt marketers to double down on a strategy that’s ineffective or even detrimental in reality.
Prescriptive Recommendations from Observational Causal Inference May Be Misinformed
If OCI techniques fail to identify treatment effects correctly, the situation may be even worse when it comes to the policies these models inform and recommend. Business and marketing analytics are not just descriptive—they often are used prescriptively. Managers use them to decide how to allocate millions in ad spend, how to design and when to run promotions, or how to personalize product experiences for users. When these decisions are based on flawed causal inferences, the business consequences could be severe.
A prime example of this issue is in m/MMM, where marketing measurement not only estimates past performance but also directly informs a company’s actions for the next period. Suppose an m/MMM incorrectly estimates that increasing spend on display ads drives sales significantly. The firm may decide to shift more budget to display ads, potentially diverting funds from channels like search or TV, which may actually have a stronger (but underestimated) causal impact. Over time, such misguided actions can lead to suboptimal marketing performance, deteriorating return on investment, and distorted assessments of channel effectiveness. What’s more, as the models fail to accurately inform business strategy, executive confidence in m/MMM techniques can be significantly eroded.
Another context where flawed OCI insights can backfire is in personalized UX design for digital products like apps, games, and social media. Companies often use data-driven models to determine what type of content or features to present to users, aiming to maximize engagement, retention, or conversion. If these models incorrectly infer that a certain feature causes users to stay longer, the company might overinvest in enhancing that feature while neglecting others that have a true impact. Worse, they may even make changes that reduce user satisfaction and drive churn.
The Problem Is Serious – And Its Extent Currently Not Fully Appreciated
Nascent large-scale real-world evidence suggests that, even when OCI is implemented on vast, rich, and granular datasets, the core issue of incorrect estimates remains. Contrary to popular belief, having more data does not solve the fundamental issues of confounding and bias. Gordon et al. (2023) show that increasing the volume of data without experimental validation does not necessarily improve the accuracy of OCI techniques. It may even amplify biases, making analysts more confident in flawed results.
The key point to restate is this: Without experimental validation, OCI is at risk of being incorrect, either in magnitude or in sign. That is, the model may not just fail to measure the size of the effect correctly—it may even get the direction of the effect wrong. A company could end up cutting a channel that is actually highly profitable or investing heavily in a strategy that has a negative impact. Ultimately, this is the worst-case scenario for a company deeply embracing data-driven decision-making.

Mitigation Strategies
Given the limitations and risks associated with OCI, what can companies do to ensure they make decisions informed by sound causal insights? There are several remedial strategies.
The most straightforward solution is to conduct experiments wherever possible. A/B tests, geo-based experiments, and incrementality tests can all help establish causality with high confidence. (For a decision tree guiding your choice of method, please see Figure 1 here.) For digital products, RCTs are often feasible: for example, testing different versions of a web page or varying the targeting criteria for ads. Running experiments, even on a small scale, can provide ground truth for causal effects, which can then be used to validate or calibrate observational models.
Another approach are bandit algorithms that conduct randomized trials in conjunction with policy learning and execution. Their ability to learn policies “on the go” is the key advantage they bring. This however requires a lot of premeditation and careful planning to leverage them successfully. We want to mention them here, but advise to start with simpler approaches to get started with experimentation.
In reality, running experiments (or bandits) across all business areas is not always practical or possible. To help ensure that OCI models produce accurate estimates for these situations, you can calibrate observational models using experimental results. For example, if a firm has run an A/B test to measure the effect of a discount campaign, the results can be used to validate an m/MMM’s estimates of the same campaign. This process, known as calibrating observational models with experimental benchmarks, helps to adjust for biases in the observational estimates. This article in Harvard Business Review summarizes different ways how calibration can be implemented, emphasizing the need for continuous validation of observational models using RCTs. This iterative process ensures that the models remain grounded in accurate empirical evidence.
In certain instances, you may be highly confident that the assumptions for OCI to produce valid causal estimates are met. An example could be the results of a tried-and-tested attribution model. Calibration and validation of OCI models against such results can also be a sensible strategy.
Another related approach can be to develop a dedicated model that is trained on all available experimental results to provide causal assessments across other business analytics decisions and use cases. In a way, such a model can be framed as a “causal attribution model.”
In some situations, experiments and calibrations may not be feasible due to budget constraints, time limitations, or operational challenges. In such cases, we recommend using well-established business strategies to cross-check and validate policy recommendations derived from OCI. If the models’ inferences are not aligned with these strategies, double- and triple-check. Examples for such strategies are:
- Pricing: Purchase history, geo-location, or value-based pricing models that have been extensively validated in the academic literature
- Advertising Strategies: Focus on smart creative strategies that align with your brand values rather than blindly following model outputs
- Product Development: Prioritize features and functionalities based on proven theories of consumer behavior rather than purely data-driven inferences
By leaning into time-tested strategies, businesses can minimize the risk of adopting flawed policies suggested by potentially biased models.
If in doubt, err on the side of caution and stick with a currently successful strategy rather than implementing ineffective or harmful changes. For recent computational advances in this regard, take a look at the m/MMM package Robyn. It provides the ability to formalize a preference for non-extreme results in addition to experiment calibration in a multi-objective optimization framework.

To see clearly and avoid costly mistakes, treat observational causal inference as a starting point, not the final word. Wherever possible, run experiments to validate your models and calibrate your estimates. If experimentation is not feasible, be critical of your models’ outputs and cross-check with established business strategies and internal expertise. Without such safeguards, your business strategy could be built on misinformation, leading to misguided decisions and wasted resources. (Photo by Nathan Dumlao on Unsplash)
A Call to Action: Experiment, Calibrate, Validate
In conclusion, while OCI techniques are valuable for exploratory analysis and generating hypotheses, current evidence suggests that relying on them without further validation is risky. In marketing and business analytics, where decisions directly impact revenue, brand equity, and customer experiences, businesses cannot afford to act on misleading insights.
“Combating Misinformation” may be a strong frame for our call to action. However, even misinformation on social media is sometimes shared without the originator knowing the information is false. Similarly, a data scientist who invested weeks of work into OCI-based modeling may deeply believe in the accuracy of their results. These results would however still misinform business decisions with potential to negatively impact share- and stakeholders.
To avoid costly mistakes, companies should treat OCI as a starting point, not the final word. Wherever possible, run experiments to validate your models and calibrate your estimates. If experimentation is not feasible, be critical of your models’ outputs and always cross-check with established business strategies and internal expertise. Without such safeguards, your business strategy could be built on misinformation, leading to misguided decisions and wasted resources.
And what better time to issue this call, with the Conference on Digital Experimentation (CODE) at MIT happening later this week. CODE gathers both the applied and academic analytics community to dive deep into experimentation as a pillar of business and marketing analytics. We hope to see you there.
About Julian and Bill
Julian Runge is a behavioral economist and data scientist. He is currently an Assistant Professor of Marketing at Northwestern University. Previously, Julian worked as a researcher on game data science and marketing analytics at Northeastern, Duke and Stanford University, and at Facebook. Julian has published extensively on these topics in the proceedings of premier machine learning conferences such as IEEE COG and AAAI AIIDE, and in leading journals such as Information Systems Research, Quantitative Marketing and Economics and Harvard Business Review.
William Grosso is an entrepreneur and investor based in San Mateo, California. Over his career, Grosso has worked for a variety of technology companies and is the founder of multiple startups, including Scientific Revenue, which pioneered dynamic pricing in mobile games, and Game Data Pros which focuses on revenue optimization in digital entertainment. Grosso is known for his expertise in distributed systems, revenue optimization, and data science, and has given talks on these topics at conferences around the world. He holds a master’s degree in mathematics from UC Berkeley and has worked as a research scientist in Artificial Intelligence at Stanford University. He is the author or co-author of three books on software development and over 50 scientific papers.