Spaces:

Sven33
/

SATE

Sleeping

App Files Files Community

SATE / README.md

Shuwei Hou

add_hf_configuration

9857c7b about 2 months ago

preview code

raw

history blame contribute delete

2.87 kB

	---
	title: SATE
	emoji: ⚡
	colorFrom: purple
	colorTo: blue
	sdk: docker
	pinned: false
	license: apache-2.0
	short_description: Speech Annotatin and Transcription Enhancer
	---


	# SATE: Speech Annotation and Transcription Enhancer (MVP)

	This is the Minimum Viable Product (MVP) version of SATE, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.

	---

	## Overview

	- Main Entry: `main_socket.py`
	- Input: Entire audio file (`.mp3`, `.wav`, etc.)
	- Output: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.

	- Preprocessing:
	- Audio segmentation
	- Speaker diarization
	- Transcription using Crisper Whisper

	- Annotation:
	- Pause
	- Repetition
	- Filler Words
	- Syllable Structure
	- Mispronunciation Sequence (PLM container is needed)

	- Feature Extraction

	---


	## Getting Started

	#### Installation

	##### 1. Clone the repo
	```bash
	git clone https://github.com/SwenHou/SATE.git
	```
	##### 2. Install packages
	```bash
	conda env create -f environment_sate_0.11.yml
	```
	##### 3. Start Inference API in your Local Computer
	Setup your Huggingface Token:
	```bash
	export HF_TOKEN=<your_token_here>
	```
	Start API:
	```bash
	python main_socket.py
	```
	#### Usage
	##### 1. Get Annotations

	```bash
	curl -X POST http://localhost:7860/process \
	-F "audio_file=@<your local path to audio file>" \
	-F "device=cuda" \
	-F "pause_threshold=0.25"
	```
	The annotation file is also available in `SATE/session_data/`

	---


	## 🐳 Use Docker

	### 1. Build Docker Image
	Tn `Dockerfile`:
	Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>`

	Run the following command in the project root directory:

	```bash
	docker build -t sate_0.11 .
	```

	### 2. Run the Docker Container
	```bash
	docker run --gpus all -it --rm \
	-p 7860:7860 \
	sate_0.11
	```

	### 3. Usage
	The usage is same as using local API, but the annotation file will be deleted after container exits.

	```bash
	curl -X POST http://localhost:7860/process \
	-F "audio_file=@<your local path to audio file>" \
	-F "device=cuda" \
	-F "pause_threshold=0.25"
	```


	---


	## 🤗 Use API from Hugging Face Spaces

	```bash
	curl -X POST https://Sven33-SATE.hf.space/process \
	-F "audio_file=@<your local path to audio file>" \
	-F "device=cuda" \
	-F "pause_threshold=0.25"
	```
	##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`

	Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.

	For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.