resumescreener_v2 / README.md
root
ss
26e8660
metadata
title: Resumescreener V2
emoji: ๐Ÿš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

๐Ÿค– AI Resume Screener

An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.

๐Ÿš€ Features

Multi-Stage AI Pipeline

  1. FAISS Recall: Semantic similarity search using BGE embeddings (top 50 candidates)
  2. Cross-Encoder Reranking: Deep semantic matching using MS-Marco model (top 20 candidates)
  3. BM25 Scoring: Traditional keyword-based relevance scoring
  4. Intent Analysis: AI-powered candidate interest assessment using Qwen LLM
  5. Final Ranking: Weighted combination of all scores

Advanced AI Models

  • Embedding Model: BAAI/bge-large-en-v1.5 for semantic understanding
  • Cross-Encoder: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
  • LLM: Qwen2-1.5B with 4-bit quantization for intent analysis

Multiple Input Methods

  • File Upload: PDF, DOCX, TXT files
  • CSV Upload: Bulk resume processing
  • Hugging Face Datasets: Direct integration with HF datasets

Comprehensive Analysis

  • Skills Extraction: Technical skills and job-specific keywords
  • Score Breakdown: Detailed analysis of each scoring component
  • Interactive Visualizations: Charts and metrics for insights
  • Export Capabilities: Download results as CSV

๐Ÿ“‹ Requirements

System Requirements

  • Python 3.8+
  • CUDA-compatible GPU (recommended for optimal performance)
  • 8GB+ RAM (16GB+ recommended)
  • 10GB+ disk space for models

Dependencies

All dependencies are listed in requirements.txt:

  • streamlit
  • sentence-transformers
  • transformers
  • torch
  • faiss-cpu
  • rank-bm25
  • nltk
  • pdfplumber
  • PyPDF2
  • python-docx
  • datasets
  • plotly
  • pandas
  • numpy

๐Ÿ› ๏ธ Installation

  1. Clone the repository:
git clone <repository-url>
cd resumescreener_v2
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
streamlit run src/streamlit_app.py

๐Ÿ“– Usage Guide

Step 1: Model Loading

  • Models are automatically loaded when the app starts
  • First run may take 5-10 minutes to download models
  • Check the sidebar for model loading status

Step 2: Job Description

  • Enter the complete job description in the text area
  • Include requirements, responsibilities, and desired skills
  • More detailed descriptions yield better matching results

Step 3: Load Resumes

Choose from three options:

Option A: File Upload

  • Upload PDF, DOCX, or TXT files
  • Supports multiple file selection
  • Automatic text extraction

Option B: CSV Upload

  • Upload CSV with resume texts
  • Select text and name columns
  • Bulk processing capability

Option C: Hugging Face Dataset

  • Load from public datasets
  • Specify dataset name and columns
  • Limited to 100 resumes for performance

Step 4: Run Pipeline

  • Click "Run Advanced Ranking Pipeline"
  • Monitor progress through 5 stages
  • Results appear in three tabs

Step 5: Analyze Results

Summary Tab

  • Top-ranked candidates table
  • Key metrics and scores
  • CSV download option

Detailed Analysis Tab

  • Individual candidate breakdowns
  • Score components explanation
  • Skills and keywords analysis
  • Resume excerpts

Visualizations Tab

  • Score distribution charts
  • Comparative analysis
  • Intent distribution
  • Average metrics

๐Ÿงฎ Scoring Formula

Final Score = 0.5 ร— Cross-Encoder + 0.3 ร— BM25 + 0.2 ร— Intent

Score Components

  1. Cross-Encoder Score (50%)

    • Deep semantic matching between job and resume
    • Considers context and meaning
    • Range: 0-1 (normalized)
  2. BM25 Score (30%)

    • Traditional keyword-based relevance
    • Term frequency and document frequency
    • Range: 0-1 (normalized)
  3. Intent Score (20%)

    • AI-assessed candidate interest level
    • Based on experience-job alignment
    • Categories: Yes (0.9), Maybe (0.5), No (0.1)

๐ŸŽฏ Best Practices

For Optimal Results

  1. Detailed Job Descriptions: Include specific requirements, technologies, and responsibilities
  2. Quality Resume Data: Ensure resumes contain relevant information
  3. Appropriate Batch Size: Process 20-100 resumes for best performance
  4. Clear Requirements: Specify must-have vs. nice-to-have skills

Performance Tips

  1. GPU Usage: Enable CUDA for faster processing
  2. Memory Management: Use cleanup controls for large batches
  3. Model Caching: Models are cached after first load
  4. Batch Processing: Process resumes in smaller batches if memory limited

๐Ÿ”ง Configuration

Model Configuration

Models can be customized by modifying the load_models() function:

  • Change model names for different embeddings
  • Adjust quantization settings
  • Modify device mapping

Scoring Weights

Adjust weights in calculate_final_scores():

final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores

Skills List

Customize the predefined skills list in the ResumeScreener class:

self.skills_list = [
    'python', 'java', 'javascript', 
    # Add your specific skills
]

๐Ÿ› Troubleshooting

Common Issues

  1. Model Loading Errors

    • Check internet connection for model downloads
    • Ensure sufficient disk space
    • Verify CUDA compatibility
  2. Memory Issues

    • Reduce batch size
    • Use CPU-only mode
    • Clear cache between runs
  3. File Processing Errors

    • Check file formats (PDF, DOCX, TXT)
    • Ensure files are not corrupted
    • Verify text extraction quality
  4. Performance Issues

    • Enable GPU acceleration
    • Process smaller batches
    • Use model quantization

Error Messages

  • "Models not loaded": Wait for model loading to complete
  • "ML libraries not available": Install missing dependencies
  • "CUDA out of memory": Reduce batch size or use CPU

๐Ÿ“Š Sample Data

Use the included sample_resumes.csv for testing:

  • 5 sample resumes with different roles
  • Realistic job experience and skills
  • Good for testing all features

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • BAAI for the BGE embedding model
  • Microsoft for the MS-Marco cross-encoder
  • Alibaba for the Qwen language model
  • Streamlit for the web framework
  • Hugging Face for model hosting and transformers library

๐Ÿ“ž Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review error messages in the sidebar
  3. Open an issue on GitHub
  4. Check model compatibility

Built with โค๏ธ using Streamlit and state-of-the-art AI models