Model Card for TR-OCR Large AR/EN Handwritten

This is a finetuned version of TROCR Large specialized in handwritten text recognition for Arabic and English languages.

Model Details

Model Description

This is a finetuned version of Microsoft's TROCR Large model, adapted for handwritten text recognition in Arabic and English languages using the Khatt and IAM Handwriting datasets.

Developed by: Me and my colleague Ahmed Wahdan
Model type: OCR (Optical Character Recognition)
Language(s) (NLP): Arabic, English
Finetuned from model: Microsoft TROCR Large

Model Sources [optional]

Repository: Kaggle Notebook - Yet to be provided
Original Model Paper: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Uses

Direct Use

This model is intended for handwritten text recognition in Arabic and English documents.

Out-of-Scope Use

The model should not be used for:

Languages other than Arabic and English
Printed text recognition
Non-text image analysis

Bias, Risks, and Limitations

Limitations

Only supports Arabic and English languages
Performance may vary with different handwriting styles
Not tested on all possible handwriting variations

Recommendations

Users should be aware that the model is specifically trained for Arabic and English handwritten text and may not perform well on other languages or printed text.

How to Get Started with the Model

# Sample code to load the model
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("David-Magdy/TR_OCR_LARGE")
model = VisionEncoderDecoderModel.from_pretrained("David-Magdy/TR_OCR_LARGE")

David-Magdy
/

TR_OCR_LARGE