Automated Failure Attribution in Multi-Agent LLM Systems: A New Benchmark and Methods

By • min read

The collaborative power of LLM-based multi-agent systems is promising, but debugging failures remains a major hurdle. Researchers from Penn State and Duke University, with collaborators from Google DeepMind, University of Washington, and others, have introduced a formal approach called Automated Failure Attribution. Their work identifies which agent and when a failure occurs, backed by a new benchmark dataset, Who&When, and multiple attribution methods. Accepted as a Spotlight at ICML 2025, this research paves the way for more reliable multi-agent systems. This Q&A explores the key aspects of their breakthrough.

What is the core problem this research tackles?

LLM-based multi-agent systems often fail despite intense activity. Developers face a critical question: which agent, at what point, was responsible? Manually sifting through enormous interaction logs is time-consuming and labor-intensive—like finding a needle in a haystack. This common frustration slows system iteration and optimization. Without quick failure identification, improvements grind to a halt. The researchers formalize this challenge as a new research problem: Automated Failure Attribution, aiming to automate the detection of the root cause agent and specific step where a failure originates. For more on their methodology, see Automated Failure Attribution.

Automated Failure Attribution in Multi-Agent LLM Systems: A New Benchmark and Methods — Source: syncedreview.com

What exactly is Automated Failure Attribution?

Automated Failure Attribution is the task of pinpointing the specific agent and the exact timestep in a multi-agent interaction that led to a task failure. Instead of relying on manual log archaeology or deep expertise, this approach uses systematic methods to trace failures back to their source. The researchers propose several attribution techniques, including baseline and advanced models, all evaluated on their new benchmark. This formalization allows developers to quickly identify bottlenecks and errors, enabling faster debugging and more efficient system improvements. For details on the benchmark used, see Who&When dataset.

Can you describe the Who&When dataset?

Who&When is the first benchmark dataset specifically designed for automated failure attribution in multi-agent systems. It contains diverse tasks and failure scenarios where the responsible agent and timing are annotated. The dataset includes logs from multi-agent LLM interactions, with ground truth labels indicating which agent failed and at which step. The researchers constructed this resource to standardize evaluation and spur further research. It is publicly available on Hugging Face at the link provided in the original paper. The dataset's release ensures reproducibility and benchmarks progress in this new area.

How did the researchers develop attribution methods?

The team developed and evaluated several automated attribution methods on the Who&When benchmark. These include:

Baseline approaches like simple log analysis and keyword matching.
Advanced attribution models that leverage the internal states of agents or use specialized prompts to trace responsibility.

They compared methods on accuracy and efficiency, revealing the complexity of the task. The best-performing methods show substantial promise but also highlight room for improvement. All code and implementations are open-sourced on GitHub to encourage community collaboration. For the key findings, see key contributions.

What are the key contributions of this work?

The research offers three major contributions:

Formalization of Automated Failure Attribution as a distinct research problem in multi-agent LLM systems.
Who&When dataset, the first benchmark for this task, enabling standardized evaluation.
Attribution methods that demonstrate baseline and advanced techniques, showing both progress and gaps.

The paper was accepted as a Spotlight presentation at ICML 2025, highlighting its significance. All resources—paper, code, and dataset—are publicly available, fostering further advances.

How can this research impact real-world multi-agent systems?

By automating failure attribution, developers can drastically reduce debugging time, moving from manual log inspection to swift, systematic identification of failure sources. This accelerates iteration cycles, making multi-agent systems more robust and easier to optimize. Practical applications include improving reliability in agent-based customer service, collaborative coding, and multi-step reasoning tasks. The open-source release ensures the community can build on this foundation, ultimately leading to more trustworthy AI systems.