Spaces:

jacob-c
/

resumescreener_v2

Paused

File size: 6,960 Bytes

---
title: Resumescreener V2
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
---

# 🤖 AI Resume Screener

An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.

## 🚀 Features

### Multi-Stage AI Pipeline
1. **FAISS Recall**: Semantic similarity search using BGE embeddings (top 50 candidates)
2. **Cross-Encoder Reranking**: Deep semantic matching using MS-Marco model (top 20 candidates)
3. **BM25 Scoring**: Traditional keyword-based relevance scoring
4. **Intent Analysis**: AI-powered candidate interest assessment using Qwen LLM
5. **Final Ranking**: Weighted combination of all scores

### Advanced AI Models
- **Embedding Model**: BAAI/bge-large-en-v1.5 for semantic understanding
- **Cross-Encoder**: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
- **LLM**: Qwen2-1.5B with 4-bit quantization for intent analysis

### Multiple Input Methods
- **File Upload**: PDF, DOCX, TXT files
- **CSV Upload**: Bulk resume processing
- **Hugging Face Datasets**: Direct integration with HF datasets

### Comprehensive Analysis
- **Skills Extraction**: Technical skills and job-specific keywords
- **Score Breakdown**: Detailed analysis of each scoring component
- **Interactive Visualizations**: Charts and metrics for insights
- **Export Capabilities**: Download results as CSV

## 📋 Requirements

### System Requirements
- Python 3.8+
- CUDA-compatible GPU (recommended for optimal performance)
- 8GB+ RAM (16GB+ recommended)
- 10GB+ disk space for models

### Dependencies
All dependencies are listed in `requirements.txt`:
- streamlit
- sentence-transformers
- transformers
- torch
- faiss-cpu
- rank-bm25
- nltk
- pdfplumber
- PyPDF2
- python-docx
- datasets
- plotly
- pandas
- numpy

## 🛠️ Installation

1. **Clone the repository**:
```bash
git clone <repository-url>
cd resumescreener_v2
```

2. **Install dependencies**:
```bash
pip install -r requirements.txt
```

3. **Run the application**:
```bash
streamlit run src/streamlit_app.py
```

## 📖 Usage Guide

### Step 1: Model Loading
- Models are automatically loaded when the app starts
- First run may take 5-10 minutes to download models
- Check the sidebar for model loading status

### Step 2: Job Description
- Enter the complete job description in the text area
- Include requirements, responsibilities, and desired skills
- More detailed descriptions yield better matching results

### Step 3: Load Resumes
Choose from three options:

#### Option A: File Upload
- Upload PDF, DOCX, or TXT files
- Supports multiple file selection
- Automatic text extraction

#### Option B: CSV Upload
- Upload CSV with resume texts
- Select text and name columns
- Bulk processing capability

#### Option C: Hugging Face Dataset
- Load from public datasets
- Specify dataset name and columns
- Limited to 100 resumes for performance

### Step 4: Run Pipeline
- Click "Run Advanced Ranking Pipeline"
- Monitor progress through 5 stages
- Results appear in three tabs

### Step 5: Analyze Results

#### Summary Tab
- Top-ranked candidates table
- Key metrics and scores
- CSV download option

#### Detailed Analysis Tab
- Individual candidate breakdowns
- Score components explanation
- Skills and keywords analysis
- Resume excerpts

#### Visualizations Tab
- Score distribution charts
- Comparative analysis
- Intent distribution
- Average metrics

## 🧮 Scoring Formula

**Final Score = 0.5 × Cross-Encoder + 0.3 × BM25 + 0.2 × Intent**

### Score Components

1. **Cross-Encoder Score (50%)**
   - Deep semantic matching between job and resume
   - Considers context and meaning
   - Range: 0-1 (normalized)

2. **BM25 Score (30%)**
   - Traditional keyword-based relevance
   - Term frequency and document frequency
   - Range: 0-1 (normalized)

3. **Intent Score (20%)**
   - AI-assessed candidate interest level
   - Based on experience-job alignment
   - Categories: Yes (0.9), Maybe (0.5), No (0.1)

## 🎯 Best Practices

### For Optimal Results
1. **Detailed Job Descriptions**: Include specific requirements, technologies, and responsibilities
2. **Quality Resume Data**: Ensure resumes contain relevant information
3. **Appropriate Batch Size**: Process 20-100 resumes for best performance
4. **Clear Requirements**: Specify must-have vs. nice-to-have skills

### Performance Tips
1. **GPU Usage**: Enable CUDA for faster processing
2. **Memory Management**: Use cleanup controls for large batches
3. **Model Caching**: Models are cached after first load
4. **Batch Processing**: Process resumes in smaller batches if memory limited

## 🔧 Configuration

### Model Configuration
Models can be customized by modifying the `load_models()` function:
- Change model names for different embeddings
- Adjust quantization settings
- Modify device mapping

### Scoring Weights
Adjust weights in `calculate_final_scores()`:
```python
final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
```

### Skills List
Customize the predefined skills list in the `ResumeScreener` class:
```python
self.skills_list = [
    'python', 'java', 'javascript', 
    # Add your specific skills
]
```

## 🐛 Troubleshooting

### Common Issues

1. **Model Loading Errors**
   - Check internet connection for model downloads
   - Ensure sufficient disk space
   - Verify CUDA compatibility

2. **Memory Issues**
   - Reduce batch size
   - Use CPU-only mode
   - Clear cache between runs

3. **File Processing Errors**
   - Check file formats (PDF, DOCX, TXT)
   - Ensure files are not corrupted
   - Verify text extraction quality

4. **Performance Issues**
   - Enable GPU acceleration
   - Process smaller batches
   - Use model quantization

### Error Messages
- **"Models not loaded"**: Wait for model loading to complete
- **"ML libraries not available"**: Install missing dependencies
- **"CUDA out of memory"**: Reduce batch size or use CPU

## 📊 Sample Data

Use the included `sample_resumes.csv` for testing:
- 5 sample resumes with different roles
- Realistic job experience and skills
- Good for testing all features

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- **BAAI** for the BGE embedding model
- **Microsoft** for the MS-Marco cross-encoder
- **Alibaba** for the Qwen language model
- **Streamlit** for the web framework
- **Hugging Face** for model hosting and transformers library

## 📞 Support

For issues and questions:
1. Check the troubleshooting section
2. Review error messages in the sidebar
3. Open an issue on GitHub
4. Check model compatibility

---

**Built with ❤️ using Streamlit and state-of-the-art AI models**