|
--- |
|
title: SATE |
|
emoji: ⚡ |
|
colorFrom: purple |
|
colorTo: blue |
|
sdk: docker |
|
pinned: false |
|
license: apache-2.0 |
|
short_description: Speech Annotatin and Transcription Enhancer |
|
--- |
|
|
|
|
|
# SATE: Speech Annotation and Transcription Enhancer (MVP) |
|
|
|
This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application. |
|
|
|
--- |
|
|
|
## Overview |
|
|
|
- **Main Entry**: `main_socket.py` |
|
- **Input**: Entire audio file (`.mp3`, `.wav`, etc.) |
|
- **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables. |
|
|
|
- **Preprocessing**: |
|
- Audio segmentation |
|
- Speaker diarization |
|
- Transcription using Crisper Whisper |
|
|
|
- **Annotation**: |
|
- Pause |
|
- Repetition |
|
- Filler Words |
|
- Syllable Structure |
|
- Mispronunciation Sequence (PLM container is needed) |
|
|
|
- **Feature Extraction** |
|
|
|
--- |
|
|
|
|
|
## Getting Started |
|
|
|
#### Installation |
|
|
|
##### 1. Clone the repo |
|
```bash |
|
git clone https://github.com/SwenHou/SATE.git |
|
``` |
|
##### 2. Install packages |
|
```bash |
|
conda env create -f environment_sate_0.11.yml |
|
``` |
|
##### 3. Start Inference API in your Local Computer |
|
Setup your Huggingface Token: |
|
```bash |
|
export HF_TOKEN=<your_token_here> |
|
``` |
|
Start API: |
|
```bash |
|
python main_socket.py |
|
``` |
|
#### Usage |
|
##### 1. Get Annotations |
|
|
|
```bash |
|
curl -X POST http://localhost:7860/process \ |
|
-F "audio_file=@<your local path to audio file>" \ |
|
-F "device=cuda" \ |
|
-F "pause_threshold=0.25" |
|
``` |
|
The annotation file is also available in `SATE/session_data/` |
|
|
|
--- |
|
|
|
|
|
## 🐳 Use Docker |
|
|
|
### 1. Build Docker Image |
|
Tn `Dockerfile`: |
|
Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>` |
|
|
|
Run the following command in the project root directory: |
|
|
|
```bash |
|
docker build -t sate_0.11 . |
|
``` |
|
|
|
### 2. Run the Docker Container |
|
```bash |
|
docker run --gpus all -it --rm \ |
|
-p 7860:7860 \ |
|
sate_0.11 |
|
``` |
|
|
|
### 3. Usage |
|
The usage is same as using local API, but the annotation file will be deleted after container exits. |
|
|
|
```bash |
|
curl -X POST http://localhost:7860/process \ |
|
-F "audio_file=@<your local path to audio file>" \ |
|
-F "device=cuda" \ |
|
-F "pause_threshold=0.25" |
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🤗 Use API from Hugging Face Spaces |
|
|
|
```bash |
|
curl -X POST https://Sven33-SATE.hf.space/process \ |
|
-F "audio_file=@<your local path to audio file>" \ |
|
-F "device=cuda" \ |
|
-F "pause_threshold=0.25" |
|
``` |
|
##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE` |
|
|
|
Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode. |
|
|
|
For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes. |