Spaces:

Agents-MCP-Hackathon
/

ClipScript

Running

App Files Files Community

ClipScript / README.md

muzzz

vidoe link added

5dc46ee 2 months ago

preview code

raw

history blame contribute delete

3.47 kB

	---
	title: ClipScript
	emoji: '🎬'
	colorFrom: pink
	colorTo: gray
	sdk: gradio
	sdk_version: 5.33.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: Transforms videos and audio into ready-to-publish blogs.
	tags:
	- agent-demo-track
	video_overview: https://youtu.be/8DUxlj79NqM
	---

	# 🎬 ClipScript: Video-to-Blog Transformer

	ClipScript is a powerful application that transforms any video or audio content into a polished, ready-to-publish blog post. Simply provide a YouTube URL or upload an audio file, and let our AI agent handle the rest.

	### Video Overview

	[Watch a video demonstrating how to use ClipScript and what it is abut here!](https://youtu.be/8DUxlj79NqM)

	## Features

	- YouTube & File Uploads: Works with YouTube links or direct audio/video file uploads.
	- AI-Powered Transcription: Utilizes a state-of-the-art ASR model for highly accurate transcription.
	- Agentic Blog Generation: An expert AI writing agent converts the raw transcript into a structured, engaging blog post, automatically removing conversational filler and adding SEO-friendly formatting.
	- Interactive Refinement: Chat with the AI agent to refine the generated blog post until it's perfect.
	- Secure & Scalable: Powered by [Modal](https://modal.com) for secure, scalable, and efficient backend processing.

	## Hugging Face Agent Demo Track

	This application has been submitted to the Agent Demo Track. It showcases an "AI agent" that acts as an expert blog writer and editor, taking a high-level goal (transforming a transcript) and executing a series of steps to achieve it.

	## Core Technology

	### Speech-to-Text: NVIDIA Parakeet TDT 0.6B V2

	The transcription engine is powered by `nvidia/parakeet-tdt-0.6b-v2`. This model is ranked #1 on the Hugging Face Open ASR Leaderboard, achieving the best overall average Word Error Rate (WER) and RTFx (real-time factor) score, making it one of the fastest and most accurate ASR models available.

	For a deep dive into the model's architecture and performance, check out the [official model card](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) and the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard).


	For audio longer than 30 minutes, the SST model automatically segments content into optimal chunks and processes them in parallel, enabling fast transcription of hours-long content while maintaining accuracy and context.

	### Content Generation: AI Writing Agent

	An AI writing agent, accessed via OpenRouter, converts the raw transcript into a polished, structured blog post, ready for publishing.

	### Backend Infrastructure: Modal

	The backend is built on [Modal](https://modal.com) for security, scalability, and performance.

	- Secure Sandboxed Execution: All media processing occurs in isolated Modal environments, keeping potentially malicious files separate from the Gradio server.

	- High-Performance File System: Modal Volumes provide fast, reliable file transfer and access for user uploads.

	This architecture keeps the frontend lightweight while offloading intensive tasks to secure, scalable cloud resources.

	## Architecture

	The following diagram illustrates the complete data flow, from user input in the Gradio application to the final blog post generation.

	![Application Architecture Diagram](https://ibb.co/SDW7NPHg)

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference