SATE / README.md
Shuwei Hou
add_hf_configuration
9857c7b
---
title: SATE
emoji:
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Speech Annotatin and Transcription Enhancer
---
# SATE: Speech Annotation and Transcription Enhancer (MVP)
This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.
---
## Overview
- **Main Entry**: `main_socket.py`
- **Input**: Entire audio file (`.mp3`, `.wav`, etc.)
- **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.
- **Preprocessing**:
- Audio segmentation
- Speaker diarization
- Transcription using Crisper Whisper
- **Annotation**:
- Pause
- Repetition
- Filler Words
- Syllable Structure
- Mispronunciation Sequence (PLM container is needed)
- **Feature Extraction**
---
## Getting Started
#### Installation
##### 1. Clone the repo
```bash
git clone https://github.com/SwenHou/SATE.git
```
##### 2. Install packages
```bash
conda env create -f environment_sate_0.11.yml
```
##### 3. Start Inference API in your Local Computer
Setup your Huggingface Token:
```bash
export HF_TOKEN=<your_token_here>
```
Start API:
```bash
python main_socket.py
```
#### Usage
##### 1. Get Annotations
```bash
curl -X POST http://localhost:7860/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
```
The annotation file is also available in `SATE/session_data/`
---
## 🐳 Use Docker
### 1. Build Docker Image
Tn `Dockerfile`:
Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>`
Run the following command in the project root directory:
```bash
docker build -t sate_0.11 .
```
### 2. Run the Docker Container
```bash
docker run --gpus all -it --rm \
-p 7860:7860 \
sate_0.11
```
### 3. Usage
The usage is same as using local API, but the annotation file will be deleted after container exits.
```bash
curl -X POST http://localhost:7860/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
```
---
## 🤗 Use API from Hugging Face Spaces
```bash
curl -X POST https://Sven33-SATE.hf.space/process \
-F "audio_file=@<your local path to audio file>" \
-F "device=cuda" \
-F "pause_threshold=0.25"
```
##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`
Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.
For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.