Spaces:
Paused
title: Resumescreener V2
emoji: ๐
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
๐ค AI Resume Screener
An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.
๐ Features
Multi-Stage AI Pipeline
- FAISS Recall: Semantic similarity search using BGE embeddings (top 50 candidates)
- Cross-Encoder Reranking: Deep semantic matching using MS-Marco model (top 20 candidates)
- BM25 Scoring: Traditional keyword-based relevance scoring
- Intent Analysis: AI-powered candidate interest assessment using Qwen LLM
- Final Ranking: Weighted combination of all scores
Advanced AI Models
- Embedding Model: BAAI/bge-large-en-v1.5 for semantic understanding
- Cross-Encoder: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
- LLM: Qwen2-1.5B with 4-bit quantization for intent analysis
Multiple Input Methods
- File Upload: PDF, DOCX, TXT files
- CSV Upload: Bulk resume processing
- Hugging Face Datasets: Direct integration with HF datasets
Comprehensive Analysis
- Skills Extraction: Technical skills and job-specific keywords
- Score Breakdown: Detailed analysis of each scoring component
- Interactive Visualizations: Charts and metrics for insights
- Export Capabilities: Download results as CSV
๐ Requirements
System Requirements
- Python 3.8+
- CUDA-compatible GPU (recommended for optimal performance)
- 8GB+ RAM (16GB+ recommended)
- 10GB+ disk space for models
Dependencies
All dependencies are listed in requirements.txt
:
- streamlit
- sentence-transformers
- transformers
- torch
- faiss-cpu
- rank-bm25
- nltk
- pdfplumber
- PyPDF2
- python-docx
- datasets
- plotly
- pandas
- numpy
๐ ๏ธ Installation
- Clone the repository:
git clone <repository-url>
cd resumescreener_v2
- Install dependencies:
pip install -r requirements.txt
- Run the application:
streamlit run src/streamlit_app.py
๐ Usage Guide
Step 1: Model Loading
- Models are automatically loaded when the app starts
- First run may take 5-10 minutes to download models
- Check the sidebar for model loading status
Step 2: Job Description
- Enter the complete job description in the text area
- Include requirements, responsibilities, and desired skills
- More detailed descriptions yield better matching results
Step 3: Load Resumes
Choose from three options:
Option A: File Upload
- Upload PDF, DOCX, or TXT files
- Supports multiple file selection
- Automatic text extraction
Option B: CSV Upload
- Upload CSV with resume texts
- Select text and name columns
- Bulk processing capability
Option C: Hugging Face Dataset
- Load from public datasets
- Specify dataset name and columns
- Limited to 100 resumes for performance
Step 4: Run Pipeline
- Click "Run Advanced Ranking Pipeline"
- Monitor progress through 5 stages
- Results appear in three tabs
Step 5: Analyze Results
Summary Tab
- Top-ranked candidates table
- Key metrics and scores
- CSV download option
Detailed Analysis Tab
- Individual candidate breakdowns
- Score components explanation
- Skills and keywords analysis
- Resume excerpts
Visualizations Tab
- Score distribution charts
- Comparative analysis
- Intent distribution
- Average metrics
๐งฎ Scoring Formula
Final Score = 0.5 ร Cross-Encoder + 0.3 ร BM25 + 0.2 ร Intent
Score Components
Cross-Encoder Score (50%)
- Deep semantic matching between job and resume
- Considers context and meaning
- Range: 0-1 (normalized)
BM25 Score (30%)
- Traditional keyword-based relevance
- Term frequency and document frequency
- Range: 0-1 (normalized)
Intent Score (20%)
- AI-assessed candidate interest level
- Based on experience-job alignment
- Categories: Yes (0.9), Maybe (0.5), No (0.1)
๐ฏ Best Practices
For Optimal Results
- Detailed Job Descriptions: Include specific requirements, technologies, and responsibilities
- Quality Resume Data: Ensure resumes contain relevant information
- Appropriate Batch Size: Process 20-100 resumes for best performance
- Clear Requirements: Specify must-have vs. nice-to-have skills
Performance Tips
- GPU Usage: Enable CUDA for faster processing
- Memory Management: Use cleanup controls for large batches
- Model Caching: Models are cached after first load
- Batch Processing: Process resumes in smaller batches if memory limited
๐ง Configuration
Model Configuration
Models can be customized by modifying the load_models()
function:
- Change model names for different embeddings
- Adjust quantization settings
- Modify device mapping
Scoring Weights
Adjust weights in calculate_final_scores()
:
final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
Skills List
Customize the predefined skills list in the ResumeScreener
class:
self.skills_list = [
'python', 'java', 'javascript',
# Add your specific skills
]
๐ Troubleshooting
Common Issues
Model Loading Errors
- Check internet connection for model downloads
- Ensure sufficient disk space
- Verify CUDA compatibility
Memory Issues
- Reduce batch size
- Use CPU-only mode
- Clear cache between runs
File Processing Errors
- Check file formats (PDF, DOCX, TXT)
- Ensure files are not corrupted
- Verify text extraction quality
Performance Issues
- Enable GPU acceleration
- Process smaller batches
- Use model quantization
Error Messages
- "Models not loaded": Wait for model loading to complete
- "ML libraries not available": Install missing dependencies
- "CUDA out of memory": Reduce batch size or use CPU
๐ Sample Data
Use the included sample_resumes.csv
for testing:
- 5 sample resumes with different roles
- Realistic job experience and skills
- Good for testing all features
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- BAAI for the BGE embedding model
- Microsoft for the MS-Marco cross-encoder
- Alibaba for the Qwen language model
- Streamlit for the web framework
- Hugging Face for model hosting and transformers library
๐ Support
For issues and questions:
- Check the troubleshooting section
- Review error messages in the sidebar
- Open an issue on GitHub
- Check model compatibility
Built with โค๏ธ using Streamlit and state-of-the-art AI models