๐ฅค SODA-BERT
Fine-tuned Arabic language model based on UBC-NLP/MARBERTv2, trained on the OmanSent dataset, the first dataset produced using the SODA data collection framework. This model focuses on sentiment analysis and text classification tasks in Arabic, with a particular emphasis on Omani and Gulf dialects.
๐ Model Details
- Base model:
UBC-NLP/MARBERTv2
- Fine-tuning dataset:
- OmanSent (Omani dialect sentiment dataset, collected using the SODA framework โ not yet publicly released)
- Languages: Arabic (Modern Standard Arabic + Gulf/Omani dialects)
- Task: Sentiment Analysis / Text Classification
๐ ๏ธ How to Use
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mktr/SODA-BERT")
model = AutoModelForSequenceClassification.from_pretrained("mktr/SODA-BERT")
text = "ุงูู ูููู ุงูุนู
ุงูู ู
ุง ู
ุงู ุดุบู ุชูู ูู ูุฌูู"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)
# Map prediction to sentiment label
label_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
predicted_label = label_map[predictions.item()]
print(f"Predicted Sentiment: {predicted_label}")
- Downloads last month
- 6
Model tree for mktr/SODA-BERT
Base model
UBC-NLP/MARBERTv2