--- base_model: - google/gemma-3n-E4B-it license: gemma --- # gemma-3n-e4b-it-audio-encoder Audio encoder from [google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it), very compressed audio encoder, ~6.5 TPS while Whisper Encoder is 50 TPS. ## how to use ```python from transformers import AutoFeatureExtractor, AutoModel import librosa model_id = "mesolitica/gemma-3n-e4b-it-audio-encoder" feature_extractor = AutoFeatureExtractor.from_pretrained(model_id) encoder = AutoModel.from_pretrained(model_id, trust_remote_code = True).cuda() y, sr = librosa.load('test.mp3', sr = feature_extractor.sampling_rate) features = feature_extractor([y], return_tensors = 'pt') features['input_features'] = features['input_features'].cuda() features['input_features_mask'] = features['input_features_mask'].cuda() output = encoder(**features) print(output[0].shape) # [B, L, D] ```