--- license: apache-2.0 language: - en base_model: - openmmlab/cascade-rcnn - microsoft/swin-base-patch4-window7-224-in22k pipeline_tag: object-detection --- # Model Card for ChartElementNet-MultiClass ChartElementNet-MultiClass is a deep learning model for multi-class scientific chart element detection. It detects axes, legends, labels, titles, tick marks, data lines, bars, and more in scientific figures. The model is powered by Cascade R-CNN with a Swin Transformer backbone and is trained on enhanced COCO-style datasets with rich chart element annotations. ## Model Details ### Model Description ChartElementNet-MultiClass automates the detection and localization of a wide range of chart elements in scientific figures. It leverages a Cascade R-CNN architecture with a Swin Transformer backbone for robust multi-class detection, especially for small and densely packed elements. The model is intended for use in document image understanding, chart parsing, and scientific figure mining. - **Developed by:** Hansheng Zhu - **Model type:** Object Detection (multi-class) - **License:** Apache-2.0 - **Finetuned from model:** openmmlab/cascade-rcnn ### Model Sources - **Repository:** [https://github.com/hanszhu/ChartSense](https://github.com/hanszhu/ChartSense) - **Paper:** https://arxiv.org/abs/2106.01841 ## Uses ### Direct Use - Detection and localization of chart elements in scientific figures - Preprocessing for downstream chart understanding and data extraction - Automated annotation and analysis of scientific figures ### Downstream Use - As a preprocessing step for chart structure parsing or data extraction - Integration into document parsing, digital library, or accessibility systems ### Out-of-Scope Use - Detection of non-scientific or artistic elements - Use on figures outside the supported element classes - Medical or legal decision making ## Bias, Risks, and Limitations - The model is limited to the chart element classes present in the training data (see below). - May not generalize to figures with highly unusual styles or poor image quality. - Potential dataset bias: Training data is sourced from scientific literature. ### Recommendations Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain. ## How to Get Started with the Model ```python import torch from mmdet.apis import inference_detector, init_detector config_file = 'legend_match_swin/cascade_rcnn_r50_fpn_meta.py' checkpoint_file = 'chart_label+.pth' model = init_detector(config_file, checkpoint_file, device='cuda:0') result = inference_detector(model, 'example_chart.png') # result: list of detected bounding boxes and class labels ``` ## Training Details ### Training Data - **Dataset:** Enhanced COCO-style scientific chart dataset - 21+ chart element classes, including axes, legends, titles, tick labels, data lines, bars, etc. - Rich metadata and bounding box annotations ### Training Procedure - Images resized to 1120x672 - Cascade R-CNN with Swin Transformer backbone - **Training regime:** fp32 - **Optimizer:** AdamW - **Batch size:** 8 - **Epochs:** 36 - **Learning rate:** 1e-4 ## Evaluation ### Testing Data, Factors & Metrics - **Testing Data:** Held-out split from enhanced COCO-style dataset - **Factors:** Element class, image quality - **Metrics:** mAP (mean Average Precision), AP50, AP75, per-class AP ### Results | Category | mAP | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l | |-----------------|-------|--------|--------|-------|-------|-------| | title | 0.837 | 0.988 | 0.957 | 0.283 | 0.775 | 0.897 | | x-axis | 0.382 | 0.860 | 0.261 | 0.382 | nan | nan | | y-axis | 0.475 | 0.949 | 0.404 | 0.475 | nan | nan | | x-tick-label | 0.807 | 0.975 | 0.891 | 0.796 | 0.835 | 0.830 | | y-tick-label | 0.785 | 0.976 | 0.893 | 0.786 | 0.632 | nan | | data-line | 0.759 | 0.986 | 0.916 | nan | 0.492 | 0.760 | | data-bar | 0.080 | 0.206 | 0.049 | 0.080 | nan | nan | | axis-title | 0.818 | 0.988 | 0.935 | 0.826 | 0.811 | 0.492 | | plot-area | 0.976 | 0.996 | 0.993 | nan | nan | 0.976 | #### Summary The model achieves high mAP for text and axis elements, moderate for lines and points, and lower for bars due to data scarcity. It demonstrates strong performance for most chart element classes in scientific figures. ## Environmental Impact - **Hardware Type:** NVIDIA V100 GPU - **Hours used:** 12 - **Cloud Provider:** Google Cloud - **Compute Region:** us-central1 - **Carbon Emitted:** ~18 kg CO2eq (estimated) ## Technical Specifications ### Model Architecture and Objective - Cascade R-CNN with Swin Transformer backbone - Multi-class object detection head for 21+ chart element classes ### Compute Infrastructure - **Hardware:** NVIDIA V100 GPU - **Software:** PyTorch 1.13, MMDetection 2.x, Python 3.9 ## Citation **BibTeX:** ```bibtex @article{DocFigure2021, title={DocFigure: A Dataset for Scientific Figure Classification}, author={S. Afzal, et al.}, journal={arXiv preprint arXiv:2106.01841}, year={2021} } ``` **APA:** Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841. ## Glossary - **Chart Element:** Any visual component of a scientific figure (e.g., axis, legend, tick label, data line, etc.) ## More Information - [DocFigure Paper](https://arxiv.org/abs/2106.01841) ## Model Card Authors Hansheng Zhu ## Model Card Contact hanszhu05@gmail.com