REALM-Bench:AReal-WorldPlanningBenchmarkforLLMsandMulti-AgentSystems

首页 > 资源库 > 研究论文 > REALM-Bench:AReal-WorldPlanningBenchmarkforLLMsandMulti-AgentSystems

REALM-Bench:AReal-WorldPlanningBenchmarkforLLMsandMulti-AgentSystems

2025-03-12

This benchmark suite provides a comprehensive evaluation framework for assessing both individual LLMs and multi-agent systems in real-world planning scenarios. The suite encompasses eleven designed problems that progress from basic to highly complex, incorporating key aspects such as multi-agent coordination, inter-agent dependencies, and dynamic environmental disruptions. Each problem can be scaled along three dimensions: the number of parallel planning threads, the complexity of inter-dependencies, and the frequency of unexpected disruptions requiring real-time adaptation. The benchmark includes detailed specifications, evaluation metrics, and baseline implementations using contemporary frameworks like LangGraph, enabling rigorous testing of both single-agent and multi-agent planning capabilities. Through standardized evaluation criteria and scalable complexity, this benchmark aims to drive progress in developing more robust and adaptable AI planning systems for real-world applications.

Tags:

REALM-Bench:AReal-WorldPlanningBenchmarkforLLMsandMulti-AgentSystems

UnsupervisedDomainAdaptationViaDataPruning

InterpolatingVideo-LLMs:TowardLonger-sequenceLMMsinaTraining-freeManner

MMSearch:BenchmarkingthePotentialofLargeModelsasMulti-modalSearchEngines

MURI:High-QualityInstructionTuningDatasetsforLow-ResourceLanguagesviaReverseInstructions

JourneyBench:AChallengingOne-StopVision-LanguageUnderstandingBenchmarkofGeneratedImages

热门文章

JointMaskedReconstructionandContrastiveLearningforMiningInteractionsBetweenProteins

RashomonSetsforPrototypical-PartNetworks:EditingInterpretableModelsinReal-Time

Limitsofnonlinearanddispersivefiberpropagationforphotonicextremelearning

SpeakingtheRightLanguage:TheImpactofExpertiseAlignmentinUser-AIInteractions

TowardsReliableVectorDatabaseManagementSystems:ASoftwareTestingRoadmapfor2030

LearningfromNoisyLabelswithContrastiveCo-Transformer

NetworkTrafficClassificationUsingMachineLearning,Transformer,andLargeLanguageModels

InteractiveGadolinium-FreeMRISynthesis:ATransformerwithLocalizationPromptLearning

V-Max:MakingRLpracticalforAutonomousDriving