--- license: mit language: - en base_model: - answerdotai/ModernBERT-base pipeline_tag: token-classification tags: - token classification - hallucination detection - transformers --- # LettuceDetect: Hallucination Detection Model **Model Name:** lettucedect-base-modernbert-en-v1 **Organization:** KRLabsOrg ## Overview LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for Retrieval-Augmented Generation (RAG) applications. This model is built on **ModernBERT**, which has been specifically chosen and trained becasue of its extended context support (up to **8192 tokens**). This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context. ## Model Details - **Architecture:** ModernBERT (Base) with extended context support (up to 8192 tokens) - **Task:** Token Classification / Hallucination Detection - **Training Dataset:** RagTruth (with potential extensions to biomedical datasets) - **Language:** English ## How It Works The model is trained to identify tokens in the answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated. ## Usage ### Installation Install the 'lettucedetect' repository ```bash pip install lettucedetect ``` ### Using the model ```python from lettucedetect.models.inference import HallucinationDetector # Initialize the detector with the transformer-based approach. detector = HallucinationDetector(method="transformer", model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1") context = """Briefly answer the following question: What is the capital of France? What is the population of France? Bear in mind that your response should be strictly based on the following three passages: passage 1: France is a country in Europe. The capital of France is Paris. The population of France is 67 million. output: """ answer = "The capital of France is Paris. The population of France is 69 million." # Get span-level predictions indicating which parts of the answer are considered hallucinated. predictions = detector.predict(context, answer, output_format="spans") print("Predictions:", predictions) # Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9944414496421814, 'text': ' The population of France is 69 million.'}] ```