view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • 28 days ago • 72
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community By Leyo and 2 others • Apr 15, 2024 • 186
PubTables-1M: Towards comprehensive table extraction from unstructured documents Paper • 2110.00061 • Published Sep 30, 2021 • 2
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Paper • 2204.08387 • Published Apr 18, 2022 • 4
DiT: Self-supervised Pre-training for Document Image Transformer Paper • 2203.02378 • Published Mar 4, 2022 • 2
LayoutLM: Pre-training of Text and Layout for Document Image Understanding Paper • 1912.13318 • Published Dec 31, 2019 • 4
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models Paper • 2109.10282 • Published Sep 21, 2021 • 7
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models Paper • 2210.08933 • Published Oct 17, 2022 • 6
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 879
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors Paper • 2501.08225 • Published Jan 14 • 19