Model Card for ChartElementNet-MultiClass
ChartElementNet-MultiClass is a deep learning model for multi-class scientific chart element detection. It detects axes, legends, labels, titles, tick marks, data lines, bars, and more in scientific figures. The model is powered by Cascade R-CNN with a Swin Transformer backbone and is trained on enhanced COCO-style datasets with rich chart element annotations.
Model Details
Model Description
ChartElementNet-MultiClass automates the detection and localization of a wide range of chart elements in scientific figures. It leverages a Cascade R-CNN architecture with a Swin Transformer backbone for robust multi-class detection, especially for small and densely packed elements. The model is intended for use in document image understanding, chart parsing, and scientific figure mining.
- Developed by: Hansheng Zhu
- Model type: Object Detection (multi-class)
- License: Apache-2.0
- Finetuned from model: openmmlab/cascade-rcnn
Model Sources
- Repository: https://github.com/hanszhu/ChartSense
- Paper: https://arxiv.org/abs/2106.01841
Uses
Direct Use
- Detection and localization of chart elements in scientific figures
- Preprocessing for downstream chart understanding and data extraction
- Automated annotation and analysis of scientific figures
Downstream Use
- As a preprocessing step for chart structure parsing or data extraction
- Integration into document parsing, digital library, or accessibility systems
Out-of-Scope Use
- Detection of non-scientific or artistic elements
- Use on figures outside the supported element classes
- Medical or legal decision making
Bias, Risks, and Limitations
- The model is limited to the chart element classes present in the training data (see below).
- May not generalize to figures with highly unusual styles or poor image quality.
- Potential dataset bias: Training data is sourced from scientific literature.
Recommendations
Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain.
How to Get Started with the Model
import torch
from mmdet.apis import inference_detector, init_detector
config_file = 'legend_match_swin/cascade_rcnn_r50_fpn_meta.py'
checkpoint_file = 'chart_label+.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'example_chart.png')
# result: list of detected bounding boxes and class labels
Training Details
Training Data
- Dataset: Enhanced COCO-style scientific chart dataset
- 21+ chart element classes, including axes, legends, titles, tick labels, data lines, bars, etc.
- Rich metadata and bounding box annotations
Training Procedure
- Images resized to 1120x672
- Cascade R-CNN with Swin Transformer backbone
- Training regime: fp32
- Optimizer: AdamW
- Batch size: 8
- Epochs: 36
- Learning rate: 1e-4
Evaluation
Testing Data, Factors & Metrics
- Testing Data: Held-out split from enhanced COCO-style dataset
- Factors: Element class, image quality
- Metrics: mAP (mean Average Precision), AP50, AP75, per-class AP
Results
Category | mAP | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l |
---|---|---|---|---|---|---|
title | 0.837 | 0.988 | 0.957 | 0.283 | 0.775 | 0.897 |
x-axis | 0.382 | 0.860 | 0.261 | 0.382 | nan | nan |
y-axis | 0.475 | 0.949 | 0.404 | 0.475 | nan | nan |
x-tick-label | 0.807 | 0.975 | 0.891 | 0.796 | 0.835 | 0.830 |
y-tick-label | 0.785 | 0.976 | 0.893 | 0.786 | 0.632 | nan |
data-line | 0.759 | 0.986 | 0.916 | nan | 0.492 | 0.760 |
data-bar | 0.080 | 0.206 | 0.049 | 0.080 | nan | nan |
axis-title | 0.818 | 0.988 | 0.935 | 0.826 | 0.811 | 0.492 |
plot-area | 0.976 | 0.996 | 0.993 | nan | nan | 0.976 |
Summary
The model achieves high mAP for text and axis elements, moderate for lines and points, and lower for bars due to data scarcity. It demonstrates strong performance for most chart element classes in scientific figures.
Environmental Impact
- Hardware Type: NVIDIA V100 GPU
- Hours used: 12
- Cloud Provider: Google Cloud
- Compute Region: us-central1
- Carbon Emitted: ~18 kg CO2eq (estimated)
Technical Specifications
Model Architecture and Objective
- Cascade R-CNN with Swin Transformer backbone
- Multi-class object detection head for 21+ chart element classes
Compute Infrastructure
- Hardware: NVIDIA V100 GPU
- Software: PyTorch 1.13, MMDetection 2.x, Python 3.9
Citation
BibTeX:
@article{DocFigure2021,
title={DocFigure: A Dataset for Scientific Figure Classification},
author={S. Afzal, et al.},
journal={arXiv preprint arXiv:2106.01841},
year={2021}
}
APA:
Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841.
Glossary
- Chart Element: Any visual component of a scientific figure (e.g., axis, legend, tick label, data line, etc.)
More Information
Model Card Authors
Hansheng Zhu
Model Card Contact
Model tree for hanszhu/ChartElementNet-MultiClass
Base model
microsoft/swin-base-patch4-window7-224-in22k