---
base_model:
- google/gemma-3n-E4B-it
license: gemma
---
# gemma-3n-e4b-it-audio-encoder

Audio encoder from [google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it), very compressed audio encoder, ~6.5 TPS while Whisper Encoder is 50 TPS.

## how to use

```python
from transformers import AutoFeatureExtractor, AutoModel
import librosa

model_id = "mesolitica/gemma-3n-e4b-it-audio-encoder"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
encoder = AutoModel.from_pretrained(model_id, trust_remote_code = True).cuda()

y, sr = librosa.load('test.mp3', sr = feature_extractor.sampling_rate)
features = feature_extractor([y], return_tensors = 'pt')
features['input_features'] = features['input_features'].cuda()
features['input_features_mask'] = features['input_features_mask'].cuda()

output = encoder(**features)
print(output[0].shape) # [B, L, D]
```