AI Models Compete in Predicting World Cup Outcomes

Researchers at Ludwig-Maximilians-Universität München (LMU) are evaluating large language models' ability to predict the 2026 FIFA World Cup outcomes through the LLM SoccerArena project. The initiative aims to measure AI accuracy in real-world forecasting scenarios.

13 June 2026

AI Models Compete in Predicting World Cup Outcomes

Ludwig-Maximilians-Universität München (LMU) has launched a new project, LLM SoccerArena, to assess the predictive capabilities of large language models (LLMs) for the 2026 FIFA World Cup. In collaboration with researchers from the University of Cologne and Paderborn University, the initiative pits AI systems like GPT, Claude, and Mistral against each other to predict match and tournament outcomes.

The project's findings are presented on a live, daily updated leaderboard, offering a transparent benchmark of the AI models' forecasting performance. Professor Stefan Feuerriegel of the LMU Munich School of Management, who leads the project, notes that differing predictions among AI models, such as varying forecasts for Spain or France winning, are scientifically interesting. These discrepancies can reveal the information sources models rely on and highlight potential biases from training data or linguistic patterns.

The World Cup serves as a realistic and verifiable benchmark for scientific evaluation. Unlike abstract test tasks, the accuracy of World Cup predictions can be definitively measured against actual results. This requires AI to interpret and weigh complex, uncertain information, including team form, player injuries, coaching decisions, and historical match data.

The insights from LLM SoccerArena are also relevant to management research. Executives increasingly use LLMs to structure market information, evaluate scenarios, and prepare forecasts. Feuerriegel emphasizes the need for benchmarks that test AI's handling of dynamic information and uncertainty in real decision-making situations.

The project compares different AI approaches: models generating predictions from their internal knowledge versus those that can retrieve and process external online information. The ability of models to effectively weigh current data, such as injury reports or betting odds, presents a significant challenge being investigated.

Original source: lmu.de