Whisper Multilingual Fine-tuned Model
This is a fine-tuned version of OpenAI's Whisper model for multilingual speech recognition.
Supported Languages
- English (en)
- Hindi (hi)
- Bengali (bn)
- Marathi (mr)
- Tamil (ta)
- Telugu (te)
Model Details
- Base Model: Distil Whisper Large V3
- Fine-tuned on: Custom multilingual dataset
- Training Framework: Transformers
- Model Type: Speech-to-Text
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
# Load model and processor
processor = WhisperProcessor.from_pretrained("TheKingMonarch/whisper-multilang-finetuned")
model = WhisperForConditionalGeneration.from_pretrained("TheKingMonarch/whisper-multilang-finetuned")
# Fix generation config
model.generation_config.forced_decoder_ids = None
# Load audio
audio, _ = librosa.load("audio.wav", sr=16000)
# Transcribe
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Language-specific Usage
# For specific language (e.g., Hindi)
forced_decoder_ids = processor.get_decoder_prompt_ids(language="hi", task="transcribe")
predicted_ids = model.generate(inputs.input_features, forced_decoder_ids=forced_decoder_ids)
Training Details
Fine-tuned using custom multilingual speech dataset
Optimized for Indian languages and English
Final WER: 27.08%
Training Steps: 600
Best WER achieved: 26.73% at step 550
Training Metrics
Step | Training Loss | Validation Loss | WER (%) |
---|---|---|---|
50 | 2.075000 | 1.930286 | 133.45 |
100 | 1.206600 | 1.275027 | 89.54 |
150 | 0.793800 | 0.712475 | 93.42 |
200 | 0.528700 | 0.562679 | 88.92 |
250 | 0.379900 | 0.473467 | 89.27 |
300 | 0.289400 | 0.369892 | 69.88 |
350 | 0.244300 | 0.291235 | 49.58 |
400 | 0.268800 | 0.249055 | 42.80 |
450 | 0.122200 | 0.209867 | 36.29 |
500 | 0.084700 | 0.173593 | 31.44 |
550 | 0.073400 | 0.155249 | 26.73 |
600 | 0.044300 | 0.148559 | 27.08 |
Training Configuration
- Base Model: distil whispwer large v3
- Learning Rate: Optimized during training
- Batch Size: Configured for optimal performance
- Training Duration: 600 steps
- Evaluation Strategy: Every 50 steps
- Early Stopping: Based on WER improvement
Limitations
- Performance may vary across different accents and dialects
- Best results on clear audio with minimal background noise
- Optimized for the specific languages listed above
Citation
If you use this model, please cite:
@misc{{whisper-multilang-finetuned,
author = {{Your Name}},
title = {{Whisper Multilingual Fine-tuned Model}},
year = {{2025}},
publisher = {{Hugging Face}},
url = {{https://huggingface.co/TheKingMonarch/whisper-multilang-finetuned}}
}}
- Downloads last month
- 69
Model tree for TheKingMonarch/whisper-multilang-finetuned
Base model
distil-whisper/distil-large-v3Evaluation results
- Word Error Rate on Custom Multilingual Datasetself-reported27.080
- Best WER on Custom Multilingual Datasetself-reported26.730