metadata

title: Resumescreener V2
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

🤖 AI Resume Screener

An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.

🚀 Features

Multi-Stage AI Pipeline

FAISS Recall: Semantic similarity search using BGE embeddings (top 50 candidates)
Cross-Encoder Reranking: Deep semantic matching using MS-Marco model (top 20 candidates)
BM25 Scoring: Traditional keyword-based relevance scoring
Intent Analysis: AI-powered candidate interest assessment using Qwen LLM
Final Ranking: Weighted combination of all scores

Advanced AI Models

Embedding Model: BAAI/bge-large-en-v1.5 for semantic understanding
Cross-Encoder: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
LLM: Qwen2-1.5B with 4-bit quantization for intent analysis

Multiple Input Methods

File Upload: PDF, DOCX, TXT files
CSV Upload: Bulk resume processing
Hugging Face Datasets: Direct integration with HF datasets

Comprehensive Analysis

Skills Extraction: Technical skills and job-specific keywords
Score Breakdown: Detailed analysis of each scoring component
Interactive Visualizations: Charts and metrics for insights
Export Capabilities: Download results as CSV

📋 Requirements

System Requirements

Python 3.8+
CUDA-compatible GPU (recommended for optimal performance)
8GB+ RAM (16GB+ recommended)
10GB+ disk space for models

Dependencies

All dependencies are listed in requirements.txt:

streamlit
sentence-transformers
transformers
torch
faiss-cpu
rank-bm25
nltk
pdfplumber
PyPDF2
python-docx
datasets
plotly
pandas
numpy

🛠️ Installation

Clone the repository:

git clone <repository-url>
cd resumescreener_v2

Install dependencies:

pip install -r requirements.txt

Run the application:

streamlit run src/streamlit_app.py

📖 Usage Guide

Step 1: Model Loading

Models are automatically loaded when the app starts
First run may take 5-10 minutes to download models
Check the sidebar for model loading status

Step 2: Job Description

Enter the complete job description in the text area
Include requirements, responsibilities, and desired skills
More detailed descriptions yield better matching results

Step 3: Load Resumes

Choose from three options:

Option A: File Upload

Upload PDF, DOCX, or TXT files
Supports multiple file selection
Automatic text extraction

Option B: CSV Upload

Upload CSV with resume texts
Select text and name columns
Bulk processing capability

Option C: Hugging Face Dataset

Load from public datasets
Specify dataset name and columns
Limited to 100 resumes for performance

Step 4: Run Pipeline

Click "Run Advanced Ranking Pipeline"
Monitor progress through 5 stages
Results appear in three tabs

Step 5: Analyze Results

Summary Tab

Top-ranked candidates table
Key metrics and scores
CSV download option

Detailed Analysis Tab

Individual candidate breakdowns
Score components explanation
Skills and keywords analysis
Resume excerpts

Visualizations Tab

Score distribution charts
Comparative analysis
Intent distribution
Average metrics

🧮 Scoring Formula

Final Score = 0.5 × Cross-Encoder + 0.3 × BM25 + 0.2 × Intent

Score Components

Cross-Encoder Score (50%)
- Deep semantic matching between job and resume
- Considers context and meaning
- Range: 0-1 (normalized)
BM25 Score (30%)
- Traditional keyword-based relevance
- Term frequency and document frequency
- Range: 0-1 (normalized)
Intent Score (20%)
- AI-assessed candidate interest level
- Based on experience-job alignment
- Categories: Yes (0.9), Maybe (0.5), No (0.1)

🎯 Best Practices

For Optimal Results

Detailed Job Descriptions: Include specific requirements, technologies, and responsibilities
Quality Resume Data: Ensure resumes contain relevant information
Appropriate Batch Size: Process 20-100 resumes for best performance
Clear Requirements: Specify must-have vs. nice-to-have skills

Performance Tips

GPU Usage: Enable CUDA for faster processing
Memory Management: Use cleanup controls for large batches
Model Caching: Models are cached after first load
Batch Processing: Process resumes in smaller batches if memory limited

🔧 Configuration

Model Configuration

Models can be customized by modifying the load_models() function:

Change model names for different embeddings
Adjust quantization settings
Modify device mapping

Scoring Weights

Adjust weights in calculate_final_scores():

final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores

Skills List

Customize the predefined skills list in the ResumeScreener class:

self.skills_list = [
    'python', 'java', 'javascript', 
    # Add your specific skills
]

🐛 Troubleshooting

Common Issues

Model Loading Errors
- Check internet connection for model downloads
- Ensure sufficient disk space
- Verify CUDA compatibility
Memory Issues
- Reduce batch size
- Use CPU-only mode
- Clear cache between runs
File Processing Errors
- Check file formats (PDF, DOCX, TXT)
- Ensure files are not corrupted
- Verify text extraction quality
Performance Issues
- Enable GPU acceleration
- Process smaller batches
- Use model quantization

Error Messages

"Models not loaded": Wait for model loading to complete
"ML libraries not available": Install missing dependencies
"CUDA out of memory": Reduce batch size or use CPU

📊 Sample Data

Use the included sample_resumes.csv for testing:

5 sample resumes with different roles
Realistic job experience and skills
Good for testing all features

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

BAAI for the BGE embedding model
Microsoft for the MS-Marco cross-encoder
Alibaba for the Qwen language model
Streamlit for the web framework
Hugging Face for model hosting and transformers library

📞 Support

For issues and questions:

Check the troubleshooting section
Review error messages in the sidebar
Open an issue on GitHub
Check model compatibility

Built with ❤️ using Streamlit and state-of-the-art AI models