Model Card for ChartElementNet-MultiClass

ChartElementNet-MultiClass is a deep learning model for multi-class scientific chart element detection. It detects axes, legends, labels, titles, tick marks, data lines, bars, and more in scientific figures. The model is powered by Cascade R-CNN with a Swin Transformer backbone and is trained on enhanced COCO-style datasets with rich chart element annotations.

Model Details

Model Description

ChartElementNet-MultiClass automates the detection and localization of a wide range of chart elements in scientific figures. It leverages a Cascade R-CNN architecture with a Swin Transformer backbone for robust multi-class detection, especially for small and densely packed elements. The model is intended for use in document image understanding, chart parsing, and scientific figure mining.

  • Developed by: Hansheng Zhu
  • Model type: Object Detection (multi-class)
  • License: Apache-2.0
  • Finetuned from model: openmmlab/cascade-rcnn

Model Sources

Uses

Direct Use

  • Detection and localization of chart elements in scientific figures
  • Preprocessing for downstream chart understanding and data extraction
  • Automated annotation and analysis of scientific figures

Downstream Use

  • As a preprocessing step for chart structure parsing or data extraction
  • Integration into document parsing, digital library, or accessibility systems

Out-of-Scope Use

  • Detection of non-scientific or artistic elements
  • Use on figures outside the supported element classes
  • Medical or legal decision making

Bias, Risks, and Limitations

  • The model is limited to the chart element classes present in the training data (see below).
  • May not generalize to figures with highly unusual styles or poor image quality.
  • Potential dataset bias: Training data is sourced from scientific literature.

Recommendations

Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain.

How to Get Started with the Model

import torch
from mmdet.apis import inference_detector, init_detector

config_file = 'legend_match_swin/cascade_rcnn_r50_fpn_meta.py'
checkpoint_file = 'chart_label+.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')

result = inference_detector(model, 'example_chart.png')
# result: list of detected bounding boxes and class labels

Training Details

Training Data

  • Dataset: Enhanced COCO-style scientific chart dataset
  • 21+ chart element classes, including axes, legends, titles, tick labels, data lines, bars, etc.
  • Rich metadata and bounding box annotations

Training Procedure

  • Images resized to 1120x672
  • Cascade R-CNN with Swin Transformer backbone
  • Training regime: fp32
  • Optimizer: AdamW
  • Batch size: 8
  • Epochs: 36
  • Learning rate: 1e-4

Evaluation

Testing Data, Factors & Metrics

  • Testing Data: Held-out split from enhanced COCO-style dataset
  • Factors: Element class, image quality
  • Metrics: mAP (mean Average Precision), AP50, AP75, per-class AP

Results

Category mAP mAP_50 mAP_75 mAP_s mAP_m mAP_l
title 0.837 0.988 0.957 0.283 0.775 0.897
x-axis 0.382 0.860 0.261 0.382 nan nan
y-axis 0.475 0.949 0.404 0.475 nan nan
x-tick-label 0.807 0.975 0.891 0.796 0.835 0.830
y-tick-label 0.785 0.976 0.893 0.786 0.632 nan
data-line 0.759 0.986 0.916 nan 0.492 0.760
data-bar 0.080 0.206 0.049 0.080 nan nan
axis-title 0.818 0.988 0.935 0.826 0.811 0.492
plot-area 0.976 0.996 0.993 nan nan 0.976

Summary

The model achieves high mAP for text and axis elements, moderate for lines and points, and lower for bars due to data scarcity. It demonstrates strong performance for most chart element classes in scientific figures.

Environmental Impact

  • Hardware Type: NVIDIA V100 GPU
  • Hours used: 12
  • Cloud Provider: Google Cloud
  • Compute Region: us-central1
  • Carbon Emitted: ~18 kg CO2eq (estimated)

Technical Specifications

Model Architecture and Objective

  • Cascade R-CNN with Swin Transformer backbone
  • Multi-class object detection head for 21+ chart element classes

Compute Infrastructure

  • Hardware: NVIDIA V100 GPU
  • Software: PyTorch 1.13, MMDetection 2.x, Python 3.9

Citation

BibTeX:

@article{DocFigure2021,
  title={DocFigure: A Dataset for Scientific Figure Classification},
  author={S. Afzal, et al.},
  journal={arXiv preprint arXiv:2106.01841},
  year={2021}
}

APA:

Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841.

Glossary

  • Chart Element: Any visual component of a scientific figure (e.g., axis, legend, tick label, data line, etc.)

More Information

Model Card Authors

Hansheng Zhu

Model Card Contact

hanszhu05@gmail.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hanszhu/ChartElementNet-MultiClass

Finetuned
(25)
this model

Space using hanszhu/ChartElementNet-MultiClass 1