Shuwei Hou commited on
Commit
75814d9
·
1 Parent(s): a3b7803

update_readme

Browse files
Files changed (1) hide show
  1. README.md +107 -9
README.md CHANGED
@@ -1,12 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: SATE
3
- emoji: ⚡
4
- colorFrom: purple
5
- colorTo: blue
6
- sdk: docker
7
- pinned: false
8
- license: apache-2.0
9
- short_description: Speech Annotatin and Transcription Enhancer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SATE: Speech Annotation and Transcription Enhancer (MVP)
2
+
3
+ This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ - **Main Entry**: `main_socket.py`
10
+ - **Input**: Entire audio file (`.mp3`, `.wav`, etc.)
11
+ - **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.
12
+
13
+ - **Preprocessing**:
14
+ - Audio segmentation
15
+ - Speaker diarization
16
+ - Transcription using Crisper Whisper
17
+
18
+ - **Annotation**:
19
+ - Pause
20
+ - Repetition
21
+ - Filler Words
22
+ - Syllable Structure
23
+ - Mispronunciation Sequence (PLM container is needed)
24
+
25
+ - **Feature Extraction**
26
+
27
+ ---
28
+
29
+
30
+ ## Getting Started
31
+
32
+ #### Installation
33
+
34
+ ##### 1. Clone the repo
35
+ ```bash
36
+ git clone https://github.com/SwenHou/SATE.git
37
+ ```
38
+ ##### 2. Install packages
39
+ ```bash
40
+ conda env create -f environment_sate_0.11.yml
41
+ ```
42
+ ##### 3. Start Inference API in your Local Computer
43
+ Setup your Huggingface Token:
44
+ ```bash
45
+ export HF_TOKEN=<your_token_here>
46
+ ```
47
+ Start API:
48
+ ```bash
49
+ python main_socket.py
50
+ ```
51
+ #### Usage
52
+ ##### 1. Get Annotations
53
+
54
+ ```bash
55
+ curl -X POST http://localhost:7860/process \
56
+ -F "audio_file=@<your local path to audio file>" \
57
+ -F "device=cuda" \
58
+ -F "pause_threshold=0.25"
59
+ ```
60
+ The annotation file is also available in `SATE/session_data/`
61
+
62
  ---
63
+
64
+
65
+ ## 🐳 Use Docker
66
+
67
+ ### 1. Build Docker Image
68
+ Tn `Dockerfile`:
69
+ Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>`
70
+
71
+ Run the following command in the project root directory:
72
+
73
+ ```bash
74
+ docker build -t sate_0.11 .
75
+ ```
76
+
77
+ ### 2. Run the Docker Container
78
+ ```bash
79
+ docker run --gpus all -it --rm \
80
+ -p 7860:7860 \
81
+ sate_0.11
82
+ ```
83
+
84
+ ### 3. Usage
85
+ The usage is same as using local API, but the annotation file will be deleted after container exits.
86
+
87
+ ```bash
88
+ curl -X POST http://localhost:7860/process \
89
+ -F "audio_file=@<your local path to audio file>" \
90
+ -F "device=cuda" \
91
+ -F "pause_threshold=0.25"
92
+ ```
93
+
94
+
95
  ---
96
 
97
+
98
+ ## 🤗 Use API from Hugging Face Spaces
99
+
100
+ ```bash
101
+ curl -X POST https://Sven33-SATE.hf.space/process \
102
+ -F "audio_file=@<your local path to audio file>" \
103
+ -F "device=cuda" \
104
+ -F "pause_threshold=0.25"
105
+ ```
106
+ ##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`
107
+
108
+ Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.
109
+
110
+ For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.