Label-Looping:HighlyEfficientDecodingforTransducers

首页 > 资源库 > 研究论文 > Label-Looping:HighlyEfficientDecodingforTransducers

Label-Looping:HighlyEfficientDecodingforTransducers

2024-06-11

This paper introduces a highly efficient greedy decoding algorithm for Transducer inference. We propose a novel data structure using CUDA tensors to represent partial hypotheses in a batch that supports parallelized hypothesis manipulations. During decoding, our algorithm maximizes GPU parallelism by adopting a nested-loop design, where the inner loop consumes all blank predictions, while non-blank predictions are handled in the outer loop. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. Experiments show that the label-looping algorithm can bring a speedup up to 2.0X compared to conventional batched decoding algorithms when using batch size 32, and can be combined with other compiler or GPU call-related techniques to bring more speedup. We will open-source our implementation to benefit the research community.

Tags:

Label-Looping:HighlyEfficientDecodingforTransducers

UnsupervisedDomainAdaptationViaDataPruning

InterpolatingVideo-LLMs:TowardLonger-sequenceLMMsinaTraining-freeManner

MMSearch:BenchmarkingthePotentialofLargeModelsasMulti-modalSearchEngines

MURI:High-QualityInstructionTuningDatasetsforLow-ResourceLanguagesviaReverseInstructions

JourneyBench:AChallengingOne-StopVision-LanguageUnderstandingBenchmarkofGeneratedImages

热门文章