muzzz commited on
Commit
bfebc17
·
1 Parent(s): a233652

update README and small twekas

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +53 -2
  3. app.py +2 -2
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: ClipScript
3
- emoji: 👀
4
  colorFrom: pink
5
  colorTo: gray
6
  sdk: gradio
@@ -8,7 +8,58 @@ sdk_version: 5.33.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: The one-stop shop for converting your videos into blog posts
 
 
 
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
  title: ClipScript
3
+ emoji: '🎬'
4
  colorFrom: pink
5
  colorTo: gray
6
  sdk: gradio
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: The one-stop shop for converting your videos into blog posts.
12
+ tags:
13
+ - agent-demo-track
14
+ video_overview: https://www.youtube.com/
15
  ---
16
 
17
+ # 🎬 ClipScript: Video-to-Blog Transformer
18
+
19
+ ClipScript is a powerful application that transforms any video or audio content into a polished, ready-to-publish blog post. Simply provide a YouTube URL or upload an audio file, and let our AI agent handle the rest.
20
+
21
+ ### Video Overview
22
+
23
+ [Watch a short video demonstrating how to use ClipScript here!]()
24
+
25
+ ## Features
26
+
27
+ - **YouTube & File Uploads**: Works with YouTube links or direct audio/video file uploads.
28
+ - **AI-Powered Transcription**: Utilizes a state-of-the-art ASR model for highly accurate transcription.
29
+ - **Agentic Blog Generation**: An expert AI writing agent converts the raw transcript into a structured, engaging blog post, automatically removing conversational filler and adding SEO-friendly formatting.
30
+ - **Interactive Refinement**: Chat with the AI agent to refine the generated blog post until it's perfect.
31
+ - **Secure & Scalable**: Powered by [Modal](https://modal.com) for secure, scalable, and efficient backend processing.
32
+
33
+ ## Hugging Face Agent Demo Track
34
+
35
+ This application has been submitted to the **Agent Demo Track**. It showcases an "AI agent" that acts as an expert blog writer and editor, taking a high-level goal (transforming a transcript) and executing a series of steps to achieve it.
36
+
37
+ ## 🛠️ Core Technology
38
+
39
+ ### Speech-to-Text: NVIDIA Parakeet TDT 0.6B V2
40
+
41
+ The transcription engine is powered by `nvidia/parakeet-tdt-0.6b-v2`. This model is **ranked #1 on the Hugging Face Open ASR Leaderboard**, achieving the best overall average Word Error Rate (WER) and RTFx (real-time factor) score, making it one of the fastest and most accurate ASR models available.
42
+
43
+ For a deep dive into the model's architecture and performance, check out the [official model card](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) and the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard).
44
+
45
+ ### Content Generation: AI Writing Agent
46
+
47
+ An AI writing agent, accessed via OpenRouter, converts the raw transcript into a polished, structured blog post, ready for publishing.
48
+
49
+ ### Backend Infrastructure: Modal
50
+
51
+ The backend is built on [Modal](https://modal.com) for security, scalability, and performance.
52
+
53
+ - **Secure Sandboxed Execution**: All media processing occurs in isolated Modal environments, keeping potentially malicious files separate from the Gradio server.
54
+
55
+ - **High-Performance File System**: Modal Volumes provide fast, reliable file transfer and access for user uploads.
56
+
57
+ This architecture keeps the frontend lightweight while offloading intensive tasks to secure, scalable cloud resources.
58
+
59
+ ## Architecture
60
+
61
+ The following diagram illustrates the complete data flow, from user input in the Gradio application to the final blog post generation.
62
+
63
+ ![Application Architecture Diagram](https://ibb.co/SDW7NPHg)
64
+
65
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -279,12 +279,12 @@ with gr.Blocks(title="ClipScript", theme=theme) as demo:
279
  with gr.Row():
280
  # Column 1: File input, URL input, and thumbnail
281
  with gr.Column(scale=1):
282
- file_input = gr.File(label="Upload any audio file", type="filepath", height=200, file_types=["audio", ".webm", ".mp3", ".mp4", ".m4a", ".ogg", ".wav"])
283
 
284
  with gr.Row():
285
  with gr.Column():
286
  url_input = gr.Textbox(
287
- label="YouTube(Recommended) or Direct Audio URL",
288
  placeholder="youtube.com/watch?v=... OR xyz.com/audio.mp3",
289
  scale=2
290
  )
 
279
  with gr.Row():
280
  # Column 1: File input, URL input, and thumbnail
281
  with gr.Column(scale=1):
282
+ file_input = gr.File(label="Upload any audio file (Recommended)", type="filepath", height=200, file_types=["audio", ".webm", ".mp3", ".mp4", ".m4a", ".ogg", ".wav"])
283
 
284
  with gr.Row():
285
  with gr.Column():
286
  url_input = gr.Textbox(
287
+ label="YouTube or Direct Audio URL",
288
  placeholder="youtube.com/watch?v=... OR xyz.com/audio.mp3",
289
  scale=2
290
  )