title: SATE
emoji: ⚡
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Speech Annotatin and Transcription Enhancer
SATE: Speech Annotation and Transcription Enhancer (MVP)
This is the Minimum Viable Product (MVP) version of SATE, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.
Overview
Main Entry:
main_socket.py
Input: Entire audio file (
.mp3
,.wav
, etc.)Output: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.
Preprocessing:
- Audio segmentation
- Speaker diarization
- Transcription using Crisper Whisper
Annotation:
- Pause
- Repetition
- Filler Words
- Syllable Structure
- Mispronunciation Sequence (PLM container is needed)
Feature Extraction
Getting Started
Installation
1. Clone the repo
git clone https://github.com/SwenHou/SATE.git
2. Install packages
conda env create -f environment_sate_0.11.yml
3. Start Inference API in your Local Computer
Setup your Huggingface Token:
export HF_TOKEN=<your_token_here>
Start API:
python main_socket.py
Usage
1. Get Annotations
curl -X POST http://localhost:7860/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
The annotation file is also available in SATE/session_data/
🐳 Use Docker
1. Build Docker Image
Tn Dockerfile
:
Delete ENV HF_HOME=/data/.huggingface
and add ENV HF_TOKEN=<your_token_here>
Run the following command in the project root directory:
docker build -t sate_0.11 .
2. Run the Docker Container
docker run --gpus all -it --rm \
-p 7860:7860 \
sate_0.11
3. Usage
The usage is same as using local API, but the annotation file will be deleted after container exits.
curl -X POST http://localhost:7860/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
🤗 Use API from Hugging Face Spaces
curl -X POST https://Sven33-SATE.hf.space/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
Hugging Face Space URL: https://huggingface.co/spaces/Sven33/SATE
Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.
For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.