arxiv:2509.11492

ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims

Published on Sep 15

· Submitted by

AnirbanSaha on Sep 16

Upvote

Authors:

Anirban Saha Anik ,

Abstract

The system uses zero-shot prompting and parameter-efficient fine-tuning to verify numerical and temporal claims, with findings highlighting the importance of evidence granularity and model adaptation.

AI-generated summary

This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab, which focuses on verifying numerical and temporal claims using retrieved evidence. We explore two complementary approaches: zero-shot prompting with instruction-tuned large language models (LLMs) and supervised fine-tuning using parameter-efficient LoRA. To enhance evidence quality, we investigate several selection strategies, including full-document input and top-k sentence filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned with LoRA achieves strong performance on the English validation set. However, a notable drop in the test set highlights a generalization challenge. These findings underscore the importance of evidence granularity and model adaptation for robust numerical fact verification.

View arXiv page View PDF Add to collection

Community

AnirbanSaha

Paper author Paper submitter about 18 hours ago

This paper, ClaimIQ at CheckThat! 2025, tackles the challenge of verifying numerical claims with evidence retrieved from documents. The authors compare zero-shot prompting, fine-tuned RoBERTa, and LoRA-tuned LLaMA models under different evidence selection strategies (full docs, BM25, MiniLM).

They show that fine-tuned LLaMA with full-document evidence achieves strong validation results, especially for the difficult Conflicting class — but performance drops sharply on the test set, raising questions about generalization.

What do you think is the bigger bottleneck here: model generalization or retrieval quality? And how could we better handle “conflicting” evidence cases?

librarian-bot

about 14 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.11492 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.11492 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.11492 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.