metadata

title: SATE
emoji: ⚡
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Speech Annotatin and Transcription Enhancer

SATE: Speech Annotation and Transcription Enhancer (MVP)

This is the Minimum Viable Product (MVP) version of SATE, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.

Overview

Main Entry: main_socket.py
Input: Entire audio file (.mp3, .wav, etc.)
Output: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.
Preprocessing:
- Audio segmentation
- Speaker diarization
- Transcription using Crisper Whisper
Annotation:
- Pause
- Repetition
- Filler Words
- Syllable Structure
- Mispronunciation Sequence (PLM container is needed)
Feature Extraction

Getting Started

Installation

1. Clone the repo

git clone https://github.com/SwenHou/SATE.git

2. Install packages

conda env create -f environment_sate_0.11.yml

3. Start Inference API in your Local Computer

Setup your Huggingface Token:

export HF_TOKEN=<your_token_here>

Start API:

python main_socket.py

Usage

1. Get Annotations

curl -X POST http://localhost:7860/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"

The annotation file is also available in SATE/session_data/

🐳 Use Docker

1. Build Docker Image

Tn Dockerfile: Delete ENV HF_HOME=/data/.huggingface and add ENV HF_TOKEN=<your_token_here>

Run the following command in the project root directory:

docker build -t sate_0.11 .

2. Run the Docker Container

docker run --gpus all -it --rm \
  -p 7860:7860 \
  sate_0.11

3. Usage

The usage is same as using local API, but the annotation file will be deleted after container exits.

curl -X POST http://localhost:7860/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"

🤗 Use API from Hugging Face Spaces

curl -X POST https://Sven33-SATE.hf.space/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"

Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`

Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.

For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.