Spaces:
Paused
Paused
File size: 6,960 Bytes
85600f5 bca907c 85600f5 bca907c 85600f5 26e8660 85600f5 26e8660 85600f5 26e8660 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
---
title: Resumescreener V2
emoji: ๐
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
---
# ๐ค AI Resume Screener
An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.
## ๐ Features
### Multi-Stage AI Pipeline
1. **FAISS Recall**: Semantic similarity search using BGE embeddings (top 50 candidates)
2. **Cross-Encoder Reranking**: Deep semantic matching using MS-Marco model (top 20 candidates)
3. **BM25 Scoring**: Traditional keyword-based relevance scoring
4. **Intent Analysis**: AI-powered candidate interest assessment using Qwen LLM
5. **Final Ranking**: Weighted combination of all scores
### Advanced AI Models
- **Embedding Model**: BAAI/bge-large-en-v1.5 for semantic understanding
- **Cross-Encoder**: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
- **LLM**: Qwen2-1.5B with 4-bit quantization for intent analysis
### Multiple Input Methods
- **File Upload**: PDF, DOCX, TXT files
- **CSV Upload**: Bulk resume processing
- **Hugging Face Datasets**: Direct integration with HF datasets
### Comprehensive Analysis
- **Skills Extraction**: Technical skills and job-specific keywords
- **Score Breakdown**: Detailed analysis of each scoring component
- **Interactive Visualizations**: Charts and metrics for insights
- **Export Capabilities**: Download results as CSV
## ๐ Requirements
### System Requirements
- Python 3.8+
- CUDA-compatible GPU (recommended for optimal performance)
- 8GB+ RAM (16GB+ recommended)
- 10GB+ disk space for models
### Dependencies
All dependencies are listed in `requirements.txt`:
- streamlit
- sentence-transformers
- transformers
- torch
- faiss-cpu
- rank-bm25
- nltk
- pdfplumber
- PyPDF2
- python-docx
- datasets
- plotly
- pandas
- numpy
## ๐ ๏ธ Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd resumescreener_v2
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Run the application**:
```bash
streamlit run src/streamlit_app.py
```
## ๐ Usage Guide
### Step 1: Model Loading
- Models are automatically loaded when the app starts
- First run may take 5-10 minutes to download models
- Check the sidebar for model loading status
### Step 2: Job Description
- Enter the complete job description in the text area
- Include requirements, responsibilities, and desired skills
- More detailed descriptions yield better matching results
### Step 3: Load Resumes
Choose from three options:
#### Option A: File Upload
- Upload PDF, DOCX, or TXT files
- Supports multiple file selection
- Automatic text extraction
#### Option B: CSV Upload
- Upload CSV with resume texts
- Select text and name columns
- Bulk processing capability
#### Option C: Hugging Face Dataset
- Load from public datasets
- Specify dataset name and columns
- Limited to 100 resumes for performance
### Step 4: Run Pipeline
- Click "Run Advanced Ranking Pipeline"
- Monitor progress through 5 stages
- Results appear in three tabs
### Step 5: Analyze Results
#### Summary Tab
- Top-ranked candidates table
- Key metrics and scores
- CSV download option
#### Detailed Analysis Tab
- Individual candidate breakdowns
- Score components explanation
- Skills and keywords analysis
- Resume excerpts
#### Visualizations Tab
- Score distribution charts
- Comparative analysis
- Intent distribution
- Average metrics
## ๐งฎ Scoring Formula
**Final Score = 0.5 ร Cross-Encoder + 0.3 ร BM25 + 0.2 ร Intent**
### Score Components
1. **Cross-Encoder Score (50%)**
- Deep semantic matching between job and resume
- Considers context and meaning
- Range: 0-1 (normalized)
2. **BM25 Score (30%)**
- Traditional keyword-based relevance
- Term frequency and document frequency
- Range: 0-1 (normalized)
3. **Intent Score (20%)**
- AI-assessed candidate interest level
- Based on experience-job alignment
- Categories: Yes (0.9), Maybe (0.5), No (0.1)
## ๐ฏ Best Practices
### For Optimal Results
1. **Detailed Job Descriptions**: Include specific requirements, technologies, and responsibilities
2. **Quality Resume Data**: Ensure resumes contain relevant information
3. **Appropriate Batch Size**: Process 20-100 resumes for best performance
4. **Clear Requirements**: Specify must-have vs. nice-to-have skills
### Performance Tips
1. **GPU Usage**: Enable CUDA for faster processing
2. **Memory Management**: Use cleanup controls for large batches
3. **Model Caching**: Models are cached after first load
4. **Batch Processing**: Process resumes in smaller batches if memory limited
## ๐ง Configuration
### Model Configuration
Models can be customized by modifying the `load_models()` function:
- Change model names for different embeddings
- Adjust quantization settings
- Modify device mapping
### Scoring Weights
Adjust weights in `calculate_final_scores()`:
```python
final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
```
### Skills List
Customize the predefined skills list in the `ResumeScreener` class:
```python
self.skills_list = [
'python', 'java', 'javascript',
# Add your specific skills
]
```
## ๐ Troubleshooting
### Common Issues
1. **Model Loading Errors**
- Check internet connection for model downloads
- Ensure sufficient disk space
- Verify CUDA compatibility
2. **Memory Issues**
- Reduce batch size
- Use CPU-only mode
- Clear cache between runs
3. **File Processing Errors**
- Check file formats (PDF, DOCX, TXT)
- Ensure files are not corrupted
- Verify text extraction quality
4. **Performance Issues**
- Enable GPU acceleration
- Process smaller batches
- Use model quantization
### Error Messages
- **"Models not loaded"**: Wait for model loading to complete
- **"ML libraries not available"**: Install missing dependencies
- **"CUDA out of memory"**: Reduce batch size or use CPU
## ๐ Sample Data
Use the included `sample_resumes.csv` for testing:
- 5 sample resumes with different roles
- Realistic job experience and skills
- Good for testing all features
## ๐ค Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## ๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
## ๐ Acknowledgments
- **BAAI** for the BGE embedding model
- **Microsoft** for the MS-Marco cross-encoder
- **Alibaba** for the Qwen language model
- **Streamlit** for the web framework
- **Hugging Face** for model hosting and transformers library
## ๐ Support
For issues and questions:
1. Check the troubleshooting section
2. Review error messages in the sidebar
3. Open an issue on GitHub
4. Check model compatibility
---
**Built with โค๏ธ using Streamlit and state-of-the-art AI models**
|