AI & ML interests

None defined yet.

Recent Activity

Manmayย 
in ResembleAI/Chatterbox 6 days ago

Watermark

1
#18 opened 7 days ago by
eugenc33
Xenovaย 
posted an update 11 days ago
view post
Post
2919
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! ๐Ÿคฏ
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐Ÿ˜

How does it work? ๐Ÿค”
1๏ธโƒฃ Generate and cache image features for each frame
2๏ธโƒฃ Create a list of embeddings for selected patch(es)
3๏ธโƒฃ Compute cosine similarity between each patch and the selected patch(es)
4๏ธโƒฃ Highlight those whose score is above some threshold

... et voilร ! ๐Ÿฅณ

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
Xenovaย 
posted an update 26 days ago
view post
Post
3985
The next generation of AI-powered websites is going to be WILD! ๐Ÿคฏ

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by ๐Ÿค— Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! ๐Ÿš€
  • 1 reply
ยท
Xenovaย 
posted an update about 1 month ago
view post
Post
3037
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! ๐Ÿคฏ
๐Ÿ—ฃ๏ธ Transcribe videos, meeting notes, songs and more
๐Ÿ” Runs on-device, meaning no data is sent to a server
๐ŸŒŽ Multilingual (8 languages)
๐Ÿค— Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! ๐Ÿ”ฅ

Try it out yourself! ๐Ÿ‘‡
webml-community/Voxtral-WebGPU
Manmayย 
in ResembleAI/Chatterbox about 2 months ago

Added to TTS Arena Fork

7
#14 opened 2 months ago by
Pendrokar
reach-vbย 
posted an update 3 months ago
view post
Post
4380
Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub ๐Ÿคฏ

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! ๐Ÿ’ฅ

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!
  • 1 reply
ยท
Xenovaย 
posted an update 3 months ago
view post
Post
7072
NEW: Real-time conversational AI models can now run 100% locally in your browser! ๐Ÿคฏ

๐Ÿ” Privacy by design (no data leaves your device)
๐Ÿ’ฐ Completely free... forever
๐Ÿ“ฆ Zero installation required, just visit a website
โšก๏ธ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! ๐Ÿค— I hope you like it!
ยท
reach-vbย 
posted an update 4 months ago
view post
Post
4319
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! ๐Ÿ’ฅ

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! ๐Ÿค—
ยท
Xenovaย 
posted an update 4 months ago
view post
Post
8399
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. ๐Ÿคฏ A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! ๐Ÿค—

Check it out! ๐Ÿ‘‡
Demo: onnx-community/model-explorer
Dataset: onnx-community/model-explorer
Source code: https://github.com/xenova/model-explorer
Xenovaย 
posted an update 5 months ago
view post
Post
2910
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! ๐Ÿคฏ

Well, with ๐Ÿค— Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! โšก๏ธ

Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.

For now, try out the demo! ๐Ÿ‘‡
webml-community/Zyphra-ZR1-WebGPU
  • 1 reply
ยท
Xenovaย 
posted an update 7 months ago
view post
Post
13886
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. โšก๏ธ

Generate 10 seconds of speech in ~1 second for $0.

What will you build? ๐Ÿ”ฅ
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
โœ‚๏ธ Implement sentence splitting, allowing for streamed responses
๐ŸŒ Multilingual support (only phonemization left)

Who wants to help?
ยท
Xenovaย 
posted an update 8 months ago
view post
Post
8814
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by ๐Ÿค— Transformers.js. WebGPU support coming soon!
๐Ÿ‘‰ npm i kokoro-js ๐Ÿ‘ˆ

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! ๐Ÿค—

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! ๐Ÿคฏ
ยท
Xenovaย 
posted an update 8 months ago
view post
Post
8616
First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! ๐Ÿคฏ

Try it out yourself! ๐Ÿ‘‡
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
Xenovaย 
posted an update 9 months ago
view post
Post
4592
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
๐Ÿš€ Faster and more accurate than Whisper
๐Ÿ”’ Privacy-focused (no data leaves your device)
โšก๏ธ WebGPU accelerated (w/ WASM fallback)
๐Ÿ”ฅ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
ยท
Xenovaย 
posted an update 9 months ago
view post
Post
3527
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! ๐Ÿ”ฅ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. ๐Ÿค— Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
reach-vbย 
posted an update 9 months ago
view post
Post
7186
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: @jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! ๐Ÿ”ฅ