jakep-allenai commited on
Commit
ec4d2f4
·
verified ·
1 Parent(s): d040979

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -1,6 +1,41 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
3
  ---
4
 
 
5
 
6
- If you found this page, we are soft-launching a new version of olmOCR! Please wait while we finish uploading everything!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ datasets:
6
+ - allenai/olmOCR-mix-0225
7
+ base_model:
8
+ - Qwen/Qwen2.5-VL-7B-Instruct
9
+ library_name: transformers
10
  ---
11
 
12
+ <img alt="olmOCR Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmocr/olmocr.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
13
 
14
+ # olmOCR-7B-0725
15
+
16
+ This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the
17
+ [olmOCR-mix-0225](https://huggingface.co/datasets/allenai/olmOCR-mix-0225) dataset.
18
+
19
+ Quick links:
20
+ - 📃 [Paper](https://olmocr.allenai.org/papers/olmocr.pdf)
21
+ - 🤗 [Dataset](https://huggingface.co/datasets/allenai/olmOCR-mix-0225)
22
+ - 🛠️ [Code](https://github.com/allenai/olmocr)
23
+ - 🎮 [Demo](https://olmocr.allenai.org/)
24
+
25
+ The best way to use this model is via the [olmOCR toolkit](https://github.com/allenai/olmocr).
26
+ The toolkit comes with an efficient inference setup via sglang that can handle millions of documents
27
+ at scale.
28
+
29
+ ## Usage
30
+
31
+ This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels.
32
+
33
+ The prompt must then contain the additional metadata from the document, and the easiest way to generate this
34
+ is to use the methods provided by the [olmOCR toolkit](https://github.com/allenai/olmocr).
35
+
36
+
37
+ ## License and use
38
+
39
+ olmOCR is licensed under the Apache 2.0 license.
40
+ olmOCR is intended for research and educational use.
41
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).