Papers
arxiv:2509.11492

ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims

Published on Sep 15
· Submitted by AnirbanSaha on Sep 16
Authors:
,
,

Abstract

The system uses zero-shot prompting and parameter-efficient fine-tuning to verify numerical and temporal claims, with findings highlighting the importance of evidence granularity and model adaptation.

AI-generated summary

This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab, which focuses on verifying numerical and temporal claims using retrieved evidence. We explore two complementary approaches: zero-shot prompting with instruction-tuned large language models (LLMs) and supervised fine-tuning using parameter-efficient LoRA. To enhance evidence quality, we investigate several selection strategies, including full-document input and top-k sentence filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned with LoRA achieves strong performance on the English validation set. However, a notable drop in the test set highlights a generalization challenge. These findings underscore the importance of evidence granularity and model adaptation for robust numerical fact verification.

Community

Paper author Paper submitter

This paper, ClaimIQ at CheckThat! 2025, tackles the challenge of verifying numerical claims with evidence retrieved from documents. The authors compare zero-shot prompting, fine-tuned RoBERTa, and LoRA-tuned LLaMA models under different evidence selection strategies (full docs, BM25, MiniLM).

They show that fine-tuned LLaMA with full-document evidence achieves strong validation results, especially for the difficult Conflicting class — but performance drops sharply on the test set, raising questions about generalization.

What do you think is the bigger bottleneck here: model generalization or retrieval quality? And how could we better handle “conflicting” evidence cases?

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.11492 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.11492 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.11492 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.