MulderFinders
MulderFinders
The truth is out there... and this model is here to help you find it.
MulderFinders is a fine-tuned version of EuroBERT/EuroBERT-210m, trained on MorcuendeA/ConspiraText-ES, a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.
Trust no one... except maybe the F1 score.
Usage
You can use the model directly with the 🤗 Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "MorcuendeA/MulderFinders"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
text = "las redes 5G nos ayudan a tener mejor internet"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=1) [0]
labels = model.config.id2label
pred = torch.argmax(probs).item()
print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
# Output:
# Prediction: rational (0.9989)
It achieves the following results on the evaluation set:
- Loss: 0.0059
- Accuracy: 0.9981
- F1 Score: 0.9983
Model description
Model description
MulderFinders is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on EuroBERT/EuroBERT-210m, a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.
Intended uses & limitations
Intended uses:
- Content moderation on social media or online forums.
- Research and analysis of conspiratorial discourse in Spanish-language texts.
- Assisting fact-checking workflows by flagging potentially conspiratorial statements.
Limitations:
- May not handle sarcasm, irony, or ambiguous language reliably.
- Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
- May reflect biases present in the training data.
Training and evaluation data
The model was fine-tuned using the ConspiraText-ES dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes. During fine-tuning, regularization was applied with attention_dropout and hidden_dropout both set to 0.2.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 69
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 6
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Score |
---|---|---|---|---|---|
0.2601 | 0.3030 | 20 | 0.0532 | 0.9848 | 0.9855 |
0.0771 | 0.6061 | 40 | 0.0197 | 0.9981 | 0.9982 |
0.0271 | 0.9091 | 60 | 0.0218 | 0.9981 | 0.9982 |
0.0189 | 1.2121 | 80 | 0.0182 | 0.9943 | 0.9945 |
0.0176 | 1.5152 | 100 | 0.0093 | 0.9962 | 0.9963 |
Framework versions
- Transformers 4.53.2
- Pytorch 2.6.0+cu124
- Datasets 2.14.4
- Tokenizers 0.21.2
- Downloads last month
- 8
Model tree for MorcuendeA/MulderFinders
Base model
EuroBERT/EuroBERT-210m