redmoe-ai-v1 commited on
Commit
aaccbca
Β·
verified Β·
1 Parent(s): e1cb4af

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -12,7 +12,7 @@ language:
12
  <div align="center">
13
 
14
  <p align="center">
15
- <img src="https://raw.githubusercontent.com/rednote-hilab/dots_ocr/main/assets/logo.png" width="300"/>
16
  <p>
17
 
18
  <h1 align="center">
@@ -25,7 +25,7 @@ dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
25
 
26
  <div align="center">
27
  <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>πŸ–₯️ Live Demo</strong></a> |
28
- <a href="https://raw.githubusercontent.com/rednote-hilab/dots_ocr/main/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>πŸ’¬ WeChat</strong></a> |
29
  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>πŸ“• rednote</strong></a>
30
  </div>
31
 
@@ -44,14 +44,14 @@ dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
44
 
45
 
46
  ### Performance Comparison: dots.ocr vs. Competing Models
47
- <img src="assets/chart.png" border="0" />
48
 
49
  > **Notes:**
50
  > - The EN, ZH metrics are the end2end evaluation results of [OmniDocBench](https://github.com/opendatalab/OmniDocBench), and Multilingual metric is the end2end evaluation results of dots.ocr-bench.
51
 
52
 
53
  ## News
54
- * ```2025.07.30 ``` πŸš€ We release [dots.ocr](https://github.com/rednote-hilab/dots_ocr), β€” a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
55
 
56
 
57
 
@@ -787,7 +787,7 @@ This is an inhouse benchmark which contain 1493 pdf images with 100 languages.
787
  </table>
788
 
789
  > **Notes:**
790
- > - prompt_layout_all_en for **parse all**, prompt_layout_only_en for **detection only**, please refer to [prompts](https://github.com/rednote-hilab/dots_ocr/blob/main/dots_ocr/utils/prompts.py)
791
 
792
 
793
  ### 3. olmOCR-bench.
@@ -965,7 +965,7 @@ This is an inhouse benchmark which contain 1493 pdf images with 100 languages.
965
  <td>75.5 Β± 1.0</td>
966
  </tr>
967
  <tr>
968
- <td>MonkeyOCR-pro-3B <a href="http://vlrlabmonkey.xyz:7685/">[Demo]</a></td>
969
  <td><strong>83.8</strong></td>
970
  <td>68.8</td>
971
  <td>74.6</td>
@@ -1032,11 +1032,11 @@ python tools/download_model.py
1032
  ## 2. Deployment
1033
  ### vLLM inference
1034
  We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1.
1035
- The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots_ocr/blob/main/docker/Dockerfile) to build the deployment environment by yourself.
1036
 
1037
  ```shell
1038
  # You need to register model to vllm at first
1039
- hf_model_path=./weights/DotsOCR # Path to your downloaded model weights
1040
  export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
1041
  sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
1042
  from DotsOCR import modeling_dots_ocr_vllm' `which vllm`
@@ -1180,27 +1180,27 @@ python demo/demo_gradio_annotion.py
1180
 
1181
 
1182
  ### Example for formula document
1183
- <img src="assets/showcase/formula1.png" alt="formula1.png" border="0" />
1184
- <img src="assets/showcase/formula2.png" alt="formula2.png" border="0" />
1185
- <img src="assets/showcase/formula3.png" alt="formula3.png" border="0" />
1186
 
1187
  ### Example for table document
1188
- <img src="assets/showcase/table1.png" alt="table1.png" border="0" />
1189
- <img src="assets/showcase/table2.png" alt="table2.png" border="0" />
1190
- <img src="assets/showcase/table3.png" alt="table3.png" border="0" />
1191
 
1192
  ### Example for multilingual document
1193
- <img src="assets/showcase/Tibetan.png" alt="Tibetan.png" border="0" />
1194
- <img src="assets/showcase/tradition_zh.png" alt="tradition_zh.png" border="0" />
1195
- <img src="assets/showcase/nl.png" alt="nl.png" border="0" />
1196
- <img src="assets/showcase/kannada.png" alt="kannada.png" border="0" />
1197
- <img src="assets/showcase/russian.png" alt="russian.png" border="0" />
1198
 
1199
  ### Example for reading order
1200
- <img src="assets/showcase/reading_order.png" alt="reading_order.png" border="0" />
1201
 
1202
  ### Example for grounding ocr
1203
- <img src="assets/showcase/grounding.png" alt="grounding.png" border="0" />
1204
 
1205
 
1206
  ## Acknowledgments
@@ -1217,7 +1217,7 @@ We also thank [DocLayNet](https://github.com/DS4SD/DocLayNet), [M6Doc](https://g
1217
 
1218
  - **Parsing Failures:** The model may fail to parse under certain conditions:
1219
  - When the character-to-pixel ratio is excessively high. Try enlarging the image or increasing the PDF parsing DPI (a setting of 200 is recommended). However, please note that the model performs optimally on images with a resolution under 11289600 pixels.
1220
- - Continuous special characters, such as ellipses (`...`) and underscores (`_`), may cause the prediction output to repeat endlessly. In such scenarios, consider using alternative prompts like `prompt_layout_only_en`, `prompt_ocr`, or `prompt_grounding_ocr` ([details here](https://github.com/rednote-hilab/dots_ocr/blob/main/dots_ocr/utils/prompts.py)).
1221
 
1222
  - **Performance Bottleneck:** Despite its 1.7B parameter LLM foundation, **dots.ocr** is not yet optimized for high-throughput processing of large PDF volumes.
1223
 
 
12
  <div align="center">
13
 
14
  <p align="center">
15
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/>
16
  <p>
17
 
18
  <h1 align="center">
 
25
 
26
  <div align="center">
27
  <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>πŸ–₯️ Live Demo</strong></a> |
28
+ <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>πŸ’¬ WeChat</strong></a> |
29
  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>πŸ“• rednote</strong></a>
30
  </div>
31
 
 
44
 
45
 
46
  ### Performance Comparison: dots.ocr vs. Competing Models
47
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/chart.png" border="0" />
48
 
49
  > **Notes:**
50
  > - The EN, ZH metrics are the end2end evaluation results of [OmniDocBench](https://github.com/opendatalab/OmniDocBench), and Multilingual metric is the end2end evaluation results of dots.ocr-bench.
51
 
52
 
53
  ## News
54
+ * ```2025.07.30 ``` πŸš€ We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), β€” a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
55
 
56
 
57
 
 
787
  </table>
788
 
789
  > **Notes:**
790
+ > - prompt_layout_all_en for **parse all**, prompt_layout_only_en for **detection only**, please refer to [prompts](https://github.com/rednote-hilab/dots.ocr/blob/master/dots_ocr/utils/prompts.py)
791
 
792
 
793
  ### 3. olmOCR-bench.
 
965
  <td>75.5 Β± 1.0</td>
966
  </tr>
967
  <tr>
968
+ <td>MonkeyOCR-pro-3B</td>
969
  <td><strong>83.8</strong></td>
970
  <td>68.8</td>
971
  <td>74.6</td>
 
1032
  ## 2. Deployment
1033
  ### vLLM inference
1034
  We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1.
1035
+ The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
1036
 
1037
  ```shell
1038
  # You need to register model to vllm at first
1039
+ export hf_model_path=./weights/DotsOCR # Path to your downloaded model weights
1040
  export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
1041
  sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
1042
  from DotsOCR import modeling_dots_ocr_vllm' `which vllm`
 
1180
 
1181
 
1182
  ### Example for formula document
1183
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula1.png" alt="formula1.png" border="0" />
1184
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula2.png" alt="formula2.png" border="0" />
1185
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula3.png" alt="formula3.png" border="0" />
1186
 
1187
  ### Example for table document
1188
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table1.png" alt="table1.png" border="0" />
1189
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table2.png" alt="table2.png" border="0" />
1190
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table3.png" alt="table3.png" border="0" />
1191
 
1192
  ### Example for multilingual document
1193
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/Tibetan.png" alt="Tibetan.png" border="0" />
1194
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/tradition_zh.png" alt="tradition_zh.png" border="0" />
1195
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/nl.png" alt="nl.png" border="0" />
1196
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/kannada.png" alt="kannada.png" border="0" />
1197
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/russian.png" alt="russian.png" border="0" />
1198
 
1199
  ### Example for reading order
1200
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/reading_order.png" alt="reading_order.png" border="0" />
1201
 
1202
  ### Example for grounding ocr
1203
+ <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/grounding.png" alt="grounding.png" border="0" />
1204
 
1205
 
1206
  ## Acknowledgments
 
1217
 
1218
  - **Parsing Failures:** The model may fail to parse under certain conditions:
1219
  - When the character-to-pixel ratio is excessively high. Try enlarging the image or increasing the PDF parsing DPI (a setting of 200 is recommended). However, please note that the model performs optimally on images with a resolution under 11289600 pixels.
1220
+ - Continuous special characters, such as ellipses (`...`) and underscores (`_`), may cause the prediction output to repeat endlessly. In such scenarios, consider using alternative prompts like `prompt_layout_only_en`, `prompt_ocr`, or `prompt_grounding_ocr` ([details here](https://github.com/rednote-hilab/dots.ocr/blob/master/dots_ocr/utils/prompts.py)).
1221
 
1222
  - **Performance Bottleneck:** Despite its 1.7B parameter LLM foundation, **dots.ocr** is not yet optimized for high-throughput processing of large PDF volumes.
1223