首页 > 资源库 > 研究论文 > AMuRD:AnnotatedMultilingualReceiptsDatasetforCross-lingualKeyInformationExtractionandClassification

AMuRD:AnnotatedMultilingualReceiptsDatasetforCross-lingualKeyInformationExtractionandClassification

2023-09-20
Key information extraction involves recognizing and extracting text from scanned receipts, enabling retrieval of essential content, and organizing it into structured documents. This paper presents a novel multilingual dataset for receipt extraction, addressing key challenges in information extraction and item classification. The dataset comprises $47,720$ samples, including annotations for item names, attributes like (price, brand, etc.), and classification into $44$ product categories. We introduce the InstructLLaMA approach, achieving an F1 score of $0.76$ and an accuracy of $0.68$ for key information extraction and item classification. We provide code, datasets, and checkpoints.\footnote{\url{this https URL}}.
Tags:
相关推荐

热门文章