CATCH:ComplementaryAdaptiveToken-levelContrastiveDecodingtoMitigateHallucinationsinLVLMs

首页 > 资源库 > 研究论文 > CATCH:ComplementaryAdaptiveToken-levelContrastiveDecodingtoMitigateHallucinationsinLVLMs

CATCH:ComplementaryAdaptiveToken-levelContrastiveDecodingtoMitigateHallucinationsinLVLMs

2024-12-17

Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous systems. Despite previous efforts to mitigate hallucinations, a persistent issue remains: visual defect from vision-language misalignment, creating a bottleneck in visual processing capacity. To address this challenge, we develop Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs (CATCH), based on the Information Bottleneck theory. CATCH introduces Complementary Visual Decoupling (CVD) for visual information separation, Non-Visual Screening (NVS) for hallucination detection, and Adaptive Token-level Contrastive Decoding (ATCD) for hallucination mitigation. CATCH addresses issues related to visual defects that cause diminished fine-grained feature perception and cumulative hallucinations in open-ended scenarios. It is applicable to various visual question-answering tasks without requiring any specific data or prior knowledge, and generalizes robustly to new tasks without additional training, opening new possibilities for advancing LVLM in various challenging applications.

Tags:

CATCH:ComplementaryAdaptiveToken-levelContrastiveDecodingtoMitigateHallucinationsinLVLMs

UnsupervisedDomainAdaptationViaDataPruning

InterpolatingVideo-LLMs:TowardLonger-sequenceLMMsinaTraining-freeManner

MMSearch:BenchmarkingthePotentialofLargeModelsasMulti-modalSearchEngines

MURI:High-QualityInstructionTuningDatasetsforLow-ResourceLanguagesviaReverseInstructions

JourneyBench:AChallengingOne-StopVision-LanguageUnderstandingBenchmarkofGeneratedImages

热门文章

MEGL:MultimodalExplanation-GuidedLearning

GaussianProcessesforProbabilisticEstimatesofEarthquakeGroundShaking:A1-DProof-of-Concept

Americans'SupportforAIDevelopment--MeasuredDailywithOpenDataandMethods

ACING:Actor-CriticforInstructionLearninginBlack-BoxLargeLanguageModels

VariableRateNeuralCompressionforSparseDetectorData

Action-AttentiveDeepReinforcementLearningforAutonomousAlignmentofBeamlines

EmpiricalMeasurementsofAITrainingPowerDemandonaGPU-AcceleratedNode

ChatBankman-Fried:anExplorationofLLMAlignmentinFinance

CliffordPerturbationApproximationforQuantumErrorMitigation