Select2Plan:Training-FreeICL-BasedPlanningthroughVQAandMemoryRetrieval

首页 > 资源库 > 研究论文 > Select2Plan:Training-FreeICL-BasedPlanningthroughVQAandMemoryRetrieval

Select2Plan:Training-FreeICL-BasedPlanningthroughVQAandMemoryRetrieval

2024-11-19

This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. Indeed, while most of existing learning-based approaches for path planning require extensive task-specific training/fine-tuning, we demonstrate how such training can be avoided for most practical cases. To do this, we introduce Select2Plan (S2P), a novel training-free framework for high-level robot planning which completely eliminates the need for fine-tuning or specialised training. By leveraging structured Visual Question-Answering (VQA) and In-Context Learning (ICL), our approach drastically reduces the need for data collection, requiring a fraction of the task-specific data typically used by trained models, or even relying only on online data. Our method facilitates the effective use of a generally trained VLM in a flexible and cost-efficient way, and does not require additional sensing except for a simple monocular camera. We demonstrate its adaptability across various scene types, context sources, and sensing setups. We evaluate our approach in two distinct scenarios: traditional First-Person View (FPV) and infrastructure-driven Third-Person View (TPV) navigation, demonstrating the flexibility and simplicity of our method. Our technique significantly enhances the navigational capabilities of a baseline VLM of approximately 50% in TPV scenario, and is comparable to trained models in the FPV one, with as few as 20 demonstrations.

Tags:

Select2Plan:Training-FreeICL-BasedPlanningthroughVQAandMemoryRetrieval

UnsupervisedDomainAdaptationViaDataPruning

InterpolatingVideo-LLMs:TowardLonger-sequenceLMMsinaTraining-freeManner

MMSearch:BenchmarkingthePotentialofLargeModelsasMulti-modalSearchEngines

MURI:High-QualityInstructionTuningDatasetsforLow-ResourceLanguagesviaReverseInstructions

JourneyBench:AChallengingOne-StopVision-LanguageUnderstandingBenchmarkofGeneratedImages

热门文章

BanglaDialecto:AnEnd-to-EndAI-PoweredRegionalSpeechStandardization

WeightedSobolevApproximationRatesforNeuralNetworksonUnboundedDomains

MiningtheMinoria:Unknown,Under-represented,andUnder-performingMinorityGroups

[VisionPaper]PRObot:EnhancingPatient-ReportedOutcomeMeasuresforDiabeticRetinopathyusingChatbotsandGenerativeAI

Acomprehensivesurveyoforaclecharacterrecognition:challenges,benchmarks,andbeyond

MachineLearningInnovationsinCPR:AComprehensiveSurveyonEnhancedResuscitationTechniques

DT-JRD:DeepTransformerbasedJustRecognizableDifferencePredictionModelforVideoCodingforMachines

Computingcriticalexponentsin3DIsingmodelviapatternrecognition/deeplearningapproach

AuscultaBase:AFoundationalStepTowardsAI-PoweredBodySoundDiagnostics