๐Ÿฅค SODA-BERT

Fine-tuned Arabic language model based on UBC-NLP/MARBERTv2, trained on the OmanSent dataset, the first dataset produced using the SODA data collection framework. This model focuses on sentiment analysis and text classification tasks in Arabic, with a particular emphasis on Omani and Gulf dialects.

๐Ÿ“Š Model Details

  • Base model: UBC-NLP/MARBERTv2
  • Fine-tuning dataset:
    • OmanSent (Omani dialect sentiment dataset, collected using the SODA framework โ€” not yet publicly released)
  • Languages: Arabic (Modern Standard Arabic + Gulf/Omani dialects)
  • Task: Sentiment Analysis / Text Classification

๐Ÿ› ๏ธ How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mktr/SODA-BERT")
model = AutoModelForSequenceClassification.from_pretrained("mktr/SODA-BERT")

text = "ุงู„ูŠ ูŠู‚ูˆู„ ุงู„ุนู…ุงู†ูŠ ู…ุง ู…ุงู„ ุดุบู„ ุชูู„ ููŠ ูˆุฌู‡ู‡"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)

# Map prediction to sentiment label
label_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
predicted_label = label_map[predictions.item()]

print(f"Predicted Sentiment: {predicted_label}")
Downloads last month
6
Safetensors
Model size
163M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mktr/SODA-BERT

Base model

UBC-NLP/MARBERTv2
Finetuned
(24)
this model