Marketing advisory, strategy and analytics company Forethought has released a new report addressing the current role and limitations of synthetic data in commercial decision-making.
Led by Professor Ujwal Kayande from Melbourne Business School, The Capability of GPT-4o in Predicting Consumer Choice tested 1,000 synthetic respondents against a human sample size of 204 respondents, across several choice models about aviation and mobile plans.
The findings of the study demonstrate how synthetic data can support certain types of management analysis, but warn that without sufficient human guidance, ChatGPT-4o currently produces nonsensical synthetic data that is unsuitable for complex decision-making.
“There’s immense potential in synthetic data, but it’s essential to understand where it aligns – and doesn’t – with human data,” Professor Kayande said.
“With advances in AI and technology, there is an enormous temptation for brands to lean on synthetic data for fast and seemingly practical insights, but without fully understanding its capabilities, brands risk basing critical decisions on incomplete or overly simplified data.
“This study was designed to address that knowledge gap, offering guidance on where synthetic data can drive value and where it may fall short. Our goal is to equip brands with the knowledge to use synthetic data effectively, ensuring it enhances rather than compromises their decision-making”.
Presented during Forethought’s recent Gen AI in marketing webinar, the white paper found several insights.
Both synthetic and human datasets achieved over 70 per cent predictive accuracy in replicating choice preferences across test models.
Synthetic data showed only half the variability in attribute impact scores compared to human data, impacting its ability to capture consumer nuances.
Testing showed that synthetic and human data led to different pricing recommendations in 45 per cent of simulations.
“Synthetic data holds enormous potential for scaling insights, but we have proven that it must be applied in a highly discerning manner,” Forethought executive chairman and founder, Ken Roberts said.
“We’re committed to helping brands understand both the advantages and limitations of synthetic data, namely that it must serve as a complement to human insights rather than a substitute. That is, at least in the technology’s current form. As Gen AI continues to evolve, our goal is to guide brands toward making choices that drive genuine value and meaningful outcomes,” added Roberts.
Human and synthetic data agreed on the top-ranked attribute in three out of four choice models, but deviations appeared in detailed preference patterns.
Synthetic data is shown to be effective in tasks such as cross-tabulation and augmentation, providing efficient data solutions in scenarios where nuanced consumer variability is less critical.
For high-stakes tasks like consumer segmentation and behavioural prediction, synthetic data may fall short, often failing to replicate the diversity and variability that real human data provides.
Synthetic data can inherit and amplify biases from the original datasets used to train models, which can undermine decision-making in applications requiring unbiased insights.