SATE / README.md
Shuwei Hou
add_hf_configuration
9857c7b
metadata
title: SATE
emoji: 
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Speech Annotatin and Transcription Enhancer

SATE: Speech Annotation and Transcription Enhancer (MVP)

This is the Minimum Viable Product (MVP) version of SATE, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.


Overview

  • Main Entry: main_socket.py

  • Input: Entire audio file (.mp3, .wav, etc.)

  • Output: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.

  • Preprocessing:

    • Audio segmentation
    • Speaker diarization
    • Transcription using Crisper Whisper
  • Annotation:

    • Pause
    • Repetition
    • Filler Words
    • Syllable Structure
    • Mispronunciation Sequence (PLM container is needed)
  • Feature Extraction


Getting Started

Installation

1. Clone the repo
git clone https://github.com/SwenHou/SATE.git
2. Install packages
conda env create -f environment_sate_0.11.yml
3. Start Inference API in your Local Computer

Setup your Huggingface Token:

export HF_TOKEN=<your_token_here>

Start API:

python main_socket.py

Usage

1. Get Annotations
curl -X POST http://localhost:7860/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"

The annotation file is also available in SATE/session_data/


🐳 Use Docker

1. Build Docker Image

Tn Dockerfile: Delete ENV HF_HOME=/data/.huggingface and add ENV HF_TOKEN=<your_token_here>

Run the following command in the project root directory:

docker build -t sate_0.11 .

2. Run the Docker Container

docker run --gpus all -it --rm \
  -p 7860:7860 \
  sate_0.11

3. Usage

The usage is same as using local API, but the annotation file will be deleted after container exits.

curl -X POST http://localhost:7860/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"

🤗 Use API from Hugging Face Spaces

curl -X POST https://Sven33-SATE.hf.space/process \
  -F "audio_file=@<your local path to audio file>" \
  -F "device=cuda" \
  -F "pause_threshold=0.25"
Hugging Face Space URL: https://huggingface.co/spaces/Sven33/SATE

Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.

For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.