File size: 6,960 Bytes
85600f5
bca907c
85600f5
 
 
 
 
 
bca907c
85600f5
 
 
 
26e8660
85600f5
26e8660
85600f5
26e8660
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
---
title: Resumescreener V2
emoji: ๐Ÿš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
---

# ๐Ÿค– AI Resume Screener

An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.

## ๐Ÿš€ Features

### Multi-Stage AI Pipeline
1. **FAISS Recall**: Semantic similarity search using BGE embeddings (top 50 candidates)
2. **Cross-Encoder Reranking**: Deep semantic matching using MS-Marco model (top 20 candidates)
3. **BM25 Scoring**: Traditional keyword-based relevance scoring
4. **Intent Analysis**: AI-powered candidate interest assessment using Qwen LLM
5. **Final Ranking**: Weighted combination of all scores

### Advanced AI Models
- **Embedding Model**: BAAI/bge-large-en-v1.5 for semantic understanding
- **Cross-Encoder**: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
- **LLM**: Qwen2-1.5B with 4-bit quantization for intent analysis

### Multiple Input Methods
- **File Upload**: PDF, DOCX, TXT files
- **CSV Upload**: Bulk resume processing
- **Hugging Face Datasets**: Direct integration with HF datasets

### Comprehensive Analysis
- **Skills Extraction**: Technical skills and job-specific keywords
- **Score Breakdown**: Detailed analysis of each scoring component
- **Interactive Visualizations**: Charts and metrics for insights
- **Export Capabilities**: Download results as CSV

## ๐Ÿ“‹ Requirements

### System Requirements
- Python 3.8+
- CUDA-compatible GPU (recommended for optimal performance)
- 8GB+ RAM (16GB+ recommended)
- 10GB+ disk space for models

### Dependencies
All dependencies are listed in `requirements.txt`:
- streamlit
- sentence-transformers
- transformers
- torch
- faiss-cpu
- rank-bm25
- nltk
- pdfplumber
- PyPDF2
- python-docx
- datasets
- plotly
- pandas
- numpy

## ๐Ÿ› ๏ธ Installation

1. **Clone the repository**:
```bash
git clone <repository-url>
cd resumescreener_v2
```

2. **Install dependencies**:
```bash
pip install -r requirements.txt
```

3. **Run the application**:
```bash
streamlit run src/streamlit_app.py
```

## ๐Ÿ“– Usage Guide

### Step 1: Model Loading
- Models are automatically loaded when the app starts
- First run may take 5-10 minutes to download models
- Check the sidebar for model loading status

### Step 2: Job Description
- Enter the complete job description in the text area
- Include requirements, responsibilities, and desired skills
- More detailed descriptions yield better matching results

### Step 3: Load Resumes
Choose from three options:

#### Option A: File Upload
- Upload PDF, DOCX, or TXT files
- Supports multiple file selection
- Automatic text extraction

#### Option B: CSV Upload
- Upload CSV with resume texts
- Select text and name columns
- Bulk processing capability

#### Option C: Hugging Face Dataset
- Load from public datasets
- Specify dataset name and columns
- Limited to 100 resumes for performance

### Step 4: Run Pipeline
- Click "Run Advanced Ranking Pipeline"
- Monitor progress through 5 stages
- Results appear in three tabs

### Step 5: Analyze Results

#### Summary Tab
- Top-ranked candidates table
- Key metrics and scores
- CSV download option

#### Detailed Analysis Tab
- Individual candidate breakdowns
- Score components explanation
- Skills and keywords analysis
- Resume excerpts

#### Visualizations Tab
- Score distribution charts
- Comparative analysis
- Intent distribution
- Average metrics

## ๐Ÿงฎ Scoring Formula

**Final Score = 0.5 ร— Cross-Encoder + 0.3 ร— BM25 + 0.2 ร— Intent**

### Score Components

1. **Cross-Encoder Score (50%)**
   - Deep semantic matching between job and resume
   - Considers context and meaning
   - Range: 0-1 (normalized)

2. **BM25 Score (30%)**
   - Traditional keyword-based relevance
   - Term frequency and document frequency
   - Range: 0-1 (normalized)

3. **Intent Score (20%)**
   - AI-assessed candidate interest level
   - Based on experience-job alignment
   - Categories: Yes (0.9), Maybe (0.5), No (0.1)

## ๐ŸŽฏ Best Practices

### For Optimal Results
1. **Detailed Job Descriptions**: Include specific requirements, technologies, and responsibilities
2. **Quality Resume Data**: Ensure resumes contain relevant information
3. **Appropriate Batch Size**: Process 20-100 resumes for best performance
4. **Clear Requirements**: Specify must-have vs. nice-to-have skills

### Performance Tips
1. **GPU Usage**: Enable CUDA for faster processing
2. **Memory Management**: Use cleanup controls for large batches
3. **Model Caching**: Models are cached after first load
4. **Batch Processing**: Process resumes in smaller batches if memory limited

## ๐Ÿ”ง Configuration

### Model Configuration
Models can be customized by modifying the `load_models()` function:
- Change model names for different embeddings
- Adjust quantization settings
- Modify device mapping

### Scoring Weights
Adjust weights in `calculate_final_scores()`:
```python
final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
```

### Skills List
Customize the predefined skills list in the `ResumeScreener` class:
```python
self.skills_list = [
    'python', 'java', 'javascript', 
    # Add your specific skills
]
```

## ๐Ÿ› Troubleshooting

### Common Issues

1. **Model Loading Errors**
   - Check internet connection for model downloads
   - Ensure sufficient disk space
   - Verify CUDA compatibility

2. **Memory Issues**
   - Reduce batch size
   - Use CPU-only mode
   - Clear cache between runs

3. **File Processing Errors**
   - Check file formats (PDF, DOCX, TXT)
   - Ensure files are not corrupted
   - Verify text extraction quality

4. **Performance Issues**
   - Enable GPU acceleration
   - Process smaller batches
   - Use model quantization

### Error Messages
- **"Models not loaded"**: Wait for model loading to complete
- **"ML libraries not available"**: Install missing dependencies
- **"CUDA out of memory"**: Reduce batch size or use CPU

## ๐Ÿ“Š Sample Data

Use the included `sample_resumes.csv` for testing:
- 5 sample resumes with different roles
- Realistic job experience and skills
- Good for testing all features

## ๐Ÿค Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

## ๐Ÿ™ Acknowledgments

- **BAAI** for the BGE embedding model
- **Microsoft** for the MS-Marco cross-encoder
- **Alibaba** for the Qwen language model
- **Streamlit** for the web framework
- **Hugging Face** for model hosting and transformers library

## ๐Ÿ“ž Support

For issues and questions:
1. Check the troubleshooting section
2. Review error messages in the sidebar
3. Open an issue on GitHub
4. Check model compatibility

---

**Built with โค๏ธ using Streamlit and state-of-the-art AI models**