AI & ML interests

None defined yet.

Xenova 
posted an update 11 days ago
view post
Post
2937
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
Xenova 
posted an update 27 days ago
view post
Post
3986
The next generation of AI-powered websites is going to be WILD! 🤯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀
  • 1 reply
·
Xenova 
posted an update about 1 month ago
view post
Post
3038
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU
Xenova 
posted an update 3 months ago
view post
Post
7073
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

🔐 Privacy by design (no data leaves your device)
💰 Completely free... forever
📦 Zero installation required, just visit a website
⚡️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!
·
Xenova 
posted an update 4 months ago
Xenova 
posted an update 5 months ago
view post
Post
2911
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! 🤯

Well, with 🤗 Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! ⚡️

Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.

For now, try out the demo! 👇
webml-community/Zyphra-ZR1-WebGPU
  • 1 reply
·