ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims
Abstract
The system uses zero-shot prompting and parameter-efficient fine-tuning to verify numerical and temporal claims, with findings highlighting the importance of evidence granularity and model adaptation.
This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab, which focuses on verifying numerical and temporal claims using retrieved evidence. We explore two complementary approaches: zero-shot prompting with instruction-tuned large language models (LLMs) and supervised fine-tuning using parameter-efficient LoRA. To enhance evidence quality, we investigate several selection strategies, including full-document input and top-k sentence filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned with LoRA achieves strong performance on the English validation set. However, a notable drop in the test set highlights a generalization challenge. These findings underscore the importance of evidence granularity and model adaptation for robust numerical fact verification.
Community
This paper, ClaimIQ at CheckThat! 2025, tackles the challenge of verifying numerical claims with evidence retrieved from documents. The authors compare zero-shot prompting, fine-tuned RoBERTa, and LoRA-tuned LLaMA models under different evidence selection strategies (full docs, BM25, MiniLM).
They show that fine-tuned LLaMA with full-document evidence achieves strong validation results, especially for the difficult Conflicting class — but performance drops sharply on the test set, raising questions about generalization.
What do you think is the bigger bottleneck here: model generalization or retrieval quality? And how could we better handle “conflicting” evidence cases?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- When Scale Meets Diversity: Evaluating Language Models on Fine-Grained Multilingual Claim Verification (2025)
- UNH at CheckThat! 2025: Fine-tuning Vs Prompting in Claim Extraction (2025)
- XplaiNLP at CheckThat! 2025: Multilingual Subjectivity Detection with Finetuned Transformers and Prompt-Based Inference with Large Language Models (2025)
- AKCIT-FN at CheckThat! 2025: Switching Fine-Tuned SLMs and LLM Prompting for Multilingual Claim Normalization (2025)
- NOWJ@COLIEE 2025: A Multi-stage Framework Integrating Embedding Models and Large Language Models for Legal Retrieval and Entailment (2025)
- ATLANTIS at SemEval-2025 Task 3: Detecting Hallucinated Text Spans in Question Answering (2025)
- HiFACTMix: A Code-Mixed Benchmark and Graph-Aware Model for EvidenceBased Political Claim Verification in Hinglish (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper