Spaces:

Agents-MCP-Hackathon
/

ClipScript

Running

App Files Files Community

muzzz commited on Jun 10

Commit

bfebc17

1 Parent(s): a233652

update README and small twekas

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +53 -2
app.py +2 -2

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: ClipScript
-emoji: 👀
 colorFrom: pink
 colorTo: gray
 sdk: gradio
@@ -8,7 +8,58 @@ sdk_version: 5.33.1
 app_file: app.py
 pinned: false
 license: mit
-short_description: The one-stop shop for converting your videos into blog posts
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: ClipScript
+emoji: '🎬'
 colorFrom: pink
 colorTo: gray
 sdk: gradio
 app_file: app.py
 pinned: false
 license: mit
+short_description: The one-stop shop for converting your videos into blog posts.
+tags:
+  - agent-demo-track
+video_overview: https://www.youtube.com/
 ---
+# 🎬 ClipScript: Video-to-Blog Transformer
+ClipScript is a powerful application that transforms any video or audio content into a polished, ready-to-publish blog post. Simply provide a YouTube URL or upload an audio file, and let our AI agent handle the rest.
+### Video Overview
+[Watch a short video demonstrating how to use ClipScript here!]()
+## Features
+- **YouTube & File Uploads**: Works with YouTube links or direct audio/video file uploads.
+- **AI-Powered Transcription**: Utilizes a state-of-the-art ASR model for highly accurate transcription.
+- **Agentic Blog Generation**: An expert AI writing agent converts the raw transcript into a structured, engaging blog post, automatically removing conversational filler and adding SEO-friendly formatting.
+- **Interactive Refinement**: Chat with the AI agent to refine the generated blog post until it's perfect.
+- **Secure & Scalable**: Powered by [Modal](https://modal.com) for secure, scalable, and efficient backend processing.
+## Hugging Face Agent Demo Track
+This application has been submitted to the **Agent Demo Track**. It showcases an "AI agent" that acts as an expert blog writer and editor, taking a high-level goal (transforming a transcript) and executing a series of steps to achieve it.
+## 🛠️ Core Technology
+### Speech-to-Text: NVIDIA Parakeet TDT 0.6B V2
+The transcription engine is powered by `nvidia/parakeet-tdt-0.6b-v2`. This model is **ranked #1 on the Hugging Face Open ASR Leaderboard**, achieving the best overall average Word Error Rate (WER) and RTFx (real-time factor) score, making it one of the fastest and most accurate ASR models available.
+For a deep dive into the model's architecture and performance, check out the [official model card](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) and the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard).
+### Content Generation: AI Writing Agent
+An AI writing agent, accessed via OpenRouter, converts the raw transcript into a polished, structured blog post, ready for publishing.
+### Backend Infrastructure: Modal
+The backend is built on [Modal](https://modal.com) for security, scalability, and performance.
+- **Secure Sandboxed Execution**: All media processing occurs in isolated Modal environments, keeping potentially malicious files separate from the Gradio server.
+- **High-Performance File System**: Modal Volumes provide fast, reliable file transfer and access for user uploads.
+This architecture keeps the frontend lightweight while offloading intensive tasks to secure, scalable cloud resources.
+## Architecture
+The following diagram illustrates the complete data flow, from user input in the Gradio application to the final blog post generation.
+![Application Architecture Diagram](https://ibb.co/SDW7NPHg)
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -279,12 +279,12 @@ with gr.Blocks(title="ClipScript", theme=theme) as demo:
     with gr.Row():
         # Column 1: File input, URL input, and thumbnail
         with gr.Column(scale=1):
-            file_input = gr.File(label="Upload any audio file", type="filepath", height=200, file_types=["audio", ".webm", ".mp3", ".mp4", ".m4a", ".ogg", ".wav"])
             with gr.Row():
                 with gr.Column():
                     url_input = gr.Textbox(
-                        label="YouTube(Recommended) or Direct Audio URL",
                         placeholder="youtube.com/watch?v=... OR xyz.com/audio.mp3",
                         scale=2
                     )

     with gr.Row():
         # Column 1: File input, URL input, and thumbnail
         with gr.Column(scale=1):
+            file_input = gr.File(label="Upload any audio file (Recommended)", type="filepath", height=200, file_types=["audio", ".webm", ".mp3", ".mp4", ".m4a", ".ogg", ".wav"])
             with gr.Row():
                 with gr.Column():
                     url_input = gr.Textbox(
+                        label="YouTube or Direct Audio URL",
                         placeholder="youtube.com/watch?v=... OR xyz.com/audio.mp3",
                         scale=2
                     )