Model Card for Model ID

This model is a fine-tuned version of facebook/bart-base, trained to convert American Sign Language (ASL) gloss sequences into fluent English sentences. It is designed to assist in research, education, and accessibility applications involving gloss-based ASL interpretation. The model was trained using high-quality aligned pairs of gloss annotations and English translations, and evaluated using BERTScore.

Model Details

Model Description

This is the model card of a transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Dongjun Kim
  • Model type: Text2Text Generation, Gloss2Eng
  • Language(s) (NLP): English

Intended Uses

This model is fine-tuned for translating American Sign Language (ASL) gloss input sequences into natural, grammatically correct English sentences. It can be used for:

  • Building real-time sign language interpretation systems
  • Research in sign language understanding and low-resource language translation
  • Educational tools for ASL learners to see gloss-to-English transformation
  • Data augmentation for multimodal ASL translation tasks

Out-of-Scope Uses

The model is not suitable for:

  • Translating from ASL videos or images directly (no visual input is processed)
  • Formal legal or medical translation without human validation
  • General-purpose translation outside ASL gloss context
  • Languages other than English

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("rrrr66254/bart-gloss-finetuned")
model = AutoModelForSeq2SeqLM.from_pretrained("rrrr66254/bart-gloss-finetuned")

gloss_input = "YOU GO STORE TOMORROW?"
inputs = tokenizer(gloss_input, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Expected output is "Are you going to the store tomorrow?"

Bias, Risks, and Limitations

This model is trained on American Sign Language (ASL) glosses mapped to natural English sentences. As such, it may inherit several limitations:

  • Data bias: If the training data overrepresents certain sentence structures, cultural expressions, or gloss forms, the model may produce outputs that lack variety or inclusivity.
  • Limited linguistic scope: The model only understands ASL gloss as input and English as output. It does not cover other sign languages or spoken/written languages.
  • Context loss: ASL gloss does not encode facial expressions, spatial grammar, or non-manual signals, which are essential in ASL. The model may misrepresent meaning as a result.
  • Generalization risk: The model may not generalize well to gloss styles or sentence structures it wasn’t trained on.

Outputs should not be used in critical settings (e.g., legal, medical, or emergency interpreting) without human review.

Recommendations

  • Human-in-the-loop: Always have a fluent signer or linguist verify model outputs in any production or educational setting.
  • Data expansion: Consider fine-tuning with more diverse gloss datasets that include different dialects or informal structures.
  • Downstream use: If used as part of a larger translation or accessibility pipeline, include disclaimers about potential misinterpretation due to a lack of non-manual signals.

Training Details

Training Data

The model was fine-tuned on a custom dataset of 1:1 pairs of ASL gloss and fluent English sentences. The glosses are structured representations of ASL without punctuation, articles, or verb conjugation. Each gloss sentence is paired with a corresponding English sentence that captures its intended meaning. The dataset was cleaned to remove non-English outputs, duplicates, and ill-formed pairs using custom filters.

Training Procedure

The training used the Hugging Face Trainer API with a sequence-to-sequence objective. The training leveraged a BART-based architecture (facebook/bart-base) to learn a mapping from gloss to fluent English sentences.

Preprocessing [optional]

  • Input text was trimmed and normalized
  • Tokenizer: Pretrained BART tokenizer
  • Special tokens: [INST] and [/INST] were used to delimit gloss input and output reference

Training Hyperparameters

Training Hyperparameters

  • Base model: facebook/bart-base
  • Epochs: 3
  • Learning rate: 5e-5
  • Batch size: 4 per device (both train and eval)
  • Gradient accumulation: Not used
  • Weight decay: 0.01
  • Learning rate scheduler: Linear (default in Trainer)
  • Precision: Mixed precision (fp16=True)
  • Evaluation strategy: Per epoch
  • Save strategy: Per epoch (with save_total_limit=2)
  • Logging frequency: Every 50 steps
  • Early stopping: Custom callback based on BERTScore with patience = 2
  • Evaluation metric: BERTScore (F1), computed with microsoft/deberta-xlarge-mnli

Factors

This model does not explicitly disaggregate results by demographic group, signer identity, or domain. However, the training data may implicitly reflect distributional biases present in publicly available gloss datasets.

Metrics

  • Primary metric: BERTScore (F1), BLEU, and ROUGE
  • Model selection: Best checkpoint based on highest validation BERTScore-F1
  • BERTScore is used to evaluate semantic alignment, while BLEU and ROUGE provide additional insight into surface-level n-gram overlap. All metrics were evaluated using the same held-out set of 500 gloss-reference pairs.

Results

After 2 epochs of training, the model achieved the following on the 500-pair evaluation set:

Metric Score
BERTScore-F1 0.7191
BERTScore-P 0.7399
BERTScore-R 0.6983
BLEU-1 0.7063
BLEU-2 0.6175
BLEU-3 0.5479
BLEU-4 0.4821
ROUGE-1 0.7587
ROUGE-2 0.5874
ROUGE-L 0.7312
  • Qualitative inspection shows that most model outputs are fluent and contextually accurate. Common errors include omission of function words and minor verb tense mismatches.

Summary

This model demonstrates strong potential for gloss-to-English translation, with near-human fluency in many cases. However, further work is needed to improve generalization to informal gloss styles and integrate non-manual features.

Model Card Authors

  • Dongjun Kim

Model Card Contact

Downloads last month
143
Safetensors
Model size
139M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rrrr66254/Glossa-BART

Base model

facebook/bart-base
Finetuned
(457)
this model