MulderFinders Logo

MulderFinders

MulderFinders

The truth is out there... and this model is here to help you find it.

MulderFinders is a fine-tuned version of EuroBERT/EuroBERT-210m, trained on MorcuendeA/ConspiraText-ES, a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.

Trust no one... except maybe the F1 score.

Usage

You can use the model directly with the 🤗 Transformers library:

  from transformers import AutoTokenizer, AutoModelForSequenceClassification
  import torch
  
  model_name = "MorcuendeA/MulderFinders"
  
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
  
  text = "las redes 5G nos ayudan a tener mejor internet"
  
  inputs = tokenizer(text, return_tensors="pt")
  outputs = model(**inputs)
  logits = outputs.logits
  probs = torch.softmax(logits, dim=1)  [0]
  labels = model.config.id2label
  pred = torch.argmax(probs).item()
  print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
  
  # Output:
  # Prediction: rational (0.9989)

It achieves the following results on the evaluation set:

  • Loss: 0.0059
  • Accuracy: 0.9981
  • F1 Score: 0.9983

Model description

Model description

MulderFinders is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on EuroBERT/EuroBERT-210m, a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.

Intended uses & limitations

Intended uses:

  • Content moderation on social media or online forums.
  • Research and analysis of conspiratorial discourse in Spanish-language texts.
  • Assisting fact-checking workflows by flagging potentially conspiratorial statements.

Limitations:

  • May not handle sarcasm, irony, or ambiguous language reliably.
  • Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
  • May reflect biases present in the training data.

Training and evaluation data

The model was fine-tuned using the ConspiraText-ES dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes. During fine-tuning, regularization was applied with attention_dropout and hidden_dropout both set to 0.2.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 69
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Accuracy F1 Score
0.2601 0.3030 20 0.0532 0.9848 0.9855
0.0771 0.6061 40 0.0197 0.9981 0.9982
0.0271 0.9091 60 0.0218 0.9981 0.9982
0.0189 1.2121 80 0.0182 0.9943 0.9945
0.0176 1.5152 100 0.0093 0.9962 0.9963

Framework versions

  • Transformers 4.53.2
  • Pytorch 2.6.0+cu124
  • Datasets 2.14.4
  • Tokenizers 0.21.2
Downloads last month
8
Safetensors
Model size
212M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MorcuendeA/MulderFinders

Finetuned
(40)
this model

Dataset used to train MorcuendeA/MulderFinders

Space using MorcuendeA/MulderFinders 1