AI & ML interests

The AI community building the future.

Recent Activity

lysandreย  updated a dataset about 3 hours ago
huggingface/transformers-metadata
merveย  updated a dataset about 6 hours ago
huggingface/documentation-images
sayakpaulย  updated a dataset about 11 hours ago
huggingface/diffusers-metadata
View all activity

Articles

sergiopaniegoย 
posted an update about 6 hours ago
view post
Post
60
Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? ๐ŸŒ‹

๐Ÿง‘โ€๐Ÿณ We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe ๐Ÿ‘‰https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images ๐ŸŒ‹
a-r-r-o-wย 
posted an update about 10 hours ago
view post
Post
117
You would've implemented the 3-loop matrix multiplication many times as a ML practitioner, but the naive implementation is terrible for GPU performance. Modern GPUs achieve peak performance through careful memory access patterns and minimizing scheduling overhead.

In naive matmul (MxK . KxN), the computation happens in tiles - both for the output matrix and for how you read chunks from the input matrices. Each thread-block processes one output tile by loading corresponding tiles from input (for sum-reduction across K dimension), performing the computation, then terminating. The GPU launches many thread-blocks and schedules them across available streaming multiprocessors (SMs). When an SM finishes one tile, it gets assigned a new thread-block for the next uncomputed tile. This way, multiple output tiles are computed in parallel across the SMs, but we pay the cost for launching thread-blocks each time a new tile is computed.

Persistent matmul changes this approach. Instead of launching thread-blocks to compute some output tiles, computing the results on SMs in parallel, and repeating until all output tiles are computed, you launch only as many thread-blocks as you have SMs available (typically 80-132 on modern GPUs). These thread-blocks stay alive until all output tiles are computed, looping through multiple tiles sequentially. Each persistent thread-block may handle multiple output tiles.

The key benefit is the reduced thread-block launch latency. This persistence strategy, combined with other optimizations like coalesced memory loads/stores, block-tiling, warp-tiling, warp-specialization, double-buffering, ping-pong scheduling and other tricks, helps achieve peak performance. More on this in the future!

Code snippet for testing: https://gist.github.com/a-r-r-o-w/28339b442d164084506c0967029968a8

(Bonus: Since I've wanted to learn Manim for a while, this was a great opportunity to make a visualization for Naive VS Persistent matmul. Enjoy โœจ)
sergiopaniegoย 
posted an update about 12 hours ago
view post
Post
1063
Just included example scripts for aligning models using GSPO (including VLM example) ๐Ÿ™†โ€โ™‚๏ธ๐Ÿ™†โ€โ™‚๏ธ

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€๐Ÿ’ป

๐Ÿง‘โ€๐ŸŽจ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
๐Ÿฆ„ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
๐Ÿงฉ More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
๐Ÿง™โ€โ™‚๏ธ GSPO paper: Group Sequence Policy Optimization (2507.18071)
AdinaYย 
posted an update 4 days ago
view post
Post
977
๐Ÿ”ฅ July highlights from Chinese AI community

zh-ai-community/july-2025-open-works-from-the-chinese-community-686586f1a8840797e477ae5a

โœจ Another "DeepSeek moment" - Kimi K2 ๐Ÿ™Œ

โœจ Qwen goes fully matrixed - Instruct / Thinking / Coder models across 30B - 480B ๐Ÿคฏ

โœจ The multimodal wave๐ŸŒŠ
- GLM-4.1V-Thinking: Image+Text > Text
- Intern-S1: Image+Text > Text
- Wan 2.2 - Text +Image > video
- Skywork-R1V3: Image+Text > Text
- Skywork-UniPic: Text > Image / Image > Text
- Tar-7B: Any-to-Any
- Ming-Lite-Omni-1.5: Any-to-Any
- Step3: Image+Text > Text
- HunyuanWorld-1: Image > 3D
- ThinkSound: Video > Audio
- Neta-Lumina: Text > Image

โœจTiny & deployable models ๐Ÿค
- SmallThinker runs on 1GB RAM

โœจAgentic coding goes mainstream ๐Ÿ’ป
- Qwen3-Coder: fully spec'd tool calling
- GLM-4.5: browser agents, IDE assistant
- Qwen3 WebDev demo: text-to-frontend code

โœจDomain-Specific & Utility Models/Tools/Dataset
- Science one S1: Scientific model
- Agentar DeepFinance: Finance dataset
- ObjectClear: Interactive Vision Tool
- Qwen3 MT Demo: Machine Translation Tool

โœจ Big month not only for models, but for policy too๐Ÿ›๏ธ
- Announced Global Action Plan for AI Governance
- Proposes to set up a World AI Cooperation Organization in Shanghai
- Released International AI Open Source Collaboration Initiative
- Published Risk Assessment Guidelines for Endpoint AI Agents

โœจ Big event - WAIC
- 355K offline visitors
- 108 new released in 4 days
- 145 sessions across key domains

Iโ€™ve been tracking things closely, but Julyโ€™s open-source wave still blew me away. Canโ€™t wait to see whatโ€™s coming next! ๐Ÿš€
megย 
posted an update 4 days ago
view post
Post
322
๐Ÿค– ๐Ÿ‘พ Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <โ€”> China relationship in AI, and what it means for AI ethics.
AdinaYย 
posted an update 4 days ago
view post
Post
1503
Qwen team did it again!!

They just released Qwen3-Coder-30B-A3B-Instruct on the hub๐Ÿ”ฅ
Qwen/Qwen3-Coder-30B-A3B-Instruct

โœจ Apache 2.0
โœจ30B total / 3.3B active (128 experts, 8 top-k)
โœจ Native 256K context, extendable to 1M via Yarn
โœจ Built for Agentic Coding
AdinaYย 
posted an update 4 days ago
view post
Post
283
Itโ€™s here! After the WAIC announcement, StepFun has just dropped Step 3 ๐Ÿ”ฅ their latest multimodal reasoning model on the hub.

Paper: Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding (2507.19427)
Model: stepfun-ai/step3

โœจ 321B total / 32B active - Apache 2.0
โœจ MFA + AFD : cutting decoding cost by up to 70% vs. DeepSeek-V3
โœจ 4T image-text pretraining: strong visionโ€“language grounding
โœจ Modular, efficient, deployable: runs on just 8ร—48GB GPUs
angtย 
posted an update 4 days ago
IlyasMoutawwakilย 
posted an update 5 days ago
view post
Post
3237
๐Ÿš€ Optimum: The Last v1 Release ๐Ÿš€
Optimum v1.27 marks the final major release in the v1 series. As we close this chapter, we're laying the groundwork for a more modular and community-driven future:
- Optimum v2: A lightweight core package for porting Transformers, Diffusers, or Sentence-Transformers to specialized AI hardware/software/accelerators..
- Optimumโ€‘ONNX: A dedicated package where the ONNX/ONNX Runtime ecosystem lives and evolves, faster-moving and decoupled from the Optimum core.

๐ŸŽฏ Why this matters:
- A clearer governance path for ONNX, fostering stronger community collaboration and improved developer experience..
- Enable innovation at a faster pace in a more modular, open-source environment.

๐Ÿ’ก What this means:
- More transparency, broader participation, and faster development driven by the community and key actors in the ONNX ecosystem (PyTorch, Microsoft, Joshua Lochner ๐Ÿ‘€, ...)
- A cleaner, more maintainable core Optimum, focused on extending HF libraries to special AI hardware/software/accelerators tooling and used by our partners (Intel Corporation, Amazon Web Services (AWS), AMD, NVIDIA, FuriosaAI, ...)

๐Ÿ› ๏ธ Major updates I worked on in this release:
โœ… Added support for Transformers v4.53 and SmolLM3 in ONNX/ONNXRuntime.
โœ… Solved batched inference/generation for all supported decoder model architectures (LLMs).

โœจ Big shoutout to @echarlaix for leading the refactoring work that cleanly separated ONNX exporter logic and enabled the creation of Optimumโ€‘ONNX.

๐Ÿ“ Release Notes: https://lnkd.in/gXtE_qji
๐Ÿ“ฆ Optimum : https://lnkd.in/ecAezNT6
๐ŸŽ Optimum-ONNX: https://lnkd.in/gzjyAjSi
#Optimum #ONNX #OpenSource #HuggingFace #Transformers #Diffusers
jsulzย 
posted an update 5 days ago
view post
Post
2580
We've crossed 1 million repositories backed by Xet storage on Hugging Face! ๐Ÿš€๐Ÿš€๐Ÿš€

You can follow along our progress converting the Hub from Git LFS to Xet at jsulz/ready-xet-go

We have a lot of repos left to migrate, which means I have plenty of time to add more animations ๐Ÿคช
AdinaYย 
posted an update 5 days ago
view post
Post
3481
Qwen3-30B-A3B-Thinking-2507 ๐Ÿ”ฅ latest step in scaling thinking capabilities from Alibaba Qwen team.

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

โœจ 30B total / 3B active - Apache 2.0
โœจ Native 256K context
โœจ SOTA coding, alignment, agentic reasoning
sergiopaniegoย 
posted an update 5 days ago
view post
Post
275
Did you miss this? ๐Ÿ‘“

๐Ÿง™โ€โ™‚๏ธvLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!
AdinaYย 
posted an update 6 days ago
view post
Post
2673
Skywork UniPic ๐Ÿ”ฅa unified autoregressive multimodal model for image understanding, generation, & editing, by Skywork ๅคฉๅทฅ

Skywork/skywork-unipic-6888c0789cdb82457b2acf32

โœจ 1.5 B - MIT License
โœจ Runs on RTX 4090
โœจ Truly unified architecture
AdinaYย 
posted an update 6 days ago
view post
Post
1695
Qwen just released Qwen3-30B-A3B-Instruct-2507 ๐Ÿ”ฅ an upgrade to the non-thinking mode model

Qwen/Qwen3-30B-A3B-Instruct-2507

โœจ 30B MoE / 3.3B active - Apache 2.0
โœจ Strong gains in reasoning, math, coding, & multilingual tasks
โœจ Native support for 256K long-context inputs
giadapย 
posted an update 6 days ago
view post
Post
2901
๐Ÿ’ฌ From Replika to everyday chatbots, millions of people are forming emotional bonds with AI, sometimes seeking comfort, sometimes seeking intimacy. But what happens when an AI tells you "I understand how you feel" and you actually believe it?

At Hugging Face, together with @frimelle and @yjernite , we dug into something we felt wasn't getting enough attention: the need to evaluate AI companionship behaviors. These are the subtle ways AI systems validate us, engage with us, and sometimes manipulate our emotional lives.

Here's what we found:
๐Ÿ‘‰ Existing benchmarks (accuracy, helpfulness, safety) completely miss this emotional dimension.
๐Ÿ‘‰ We mapped how leading AI systems actually respond to vulnerable prompts. ๐Ÿ‘‰ We built the Interactions and Machine Attachment Benchmark (INTIMA): a first attempt at evaluating how models handle emotional dependency, boundaries, and attachment (with a full paper coming soon).

Check out the blog post: https://huggingface.co/blog/giadap/evaluating-companionship

๐Ÿšข We also shipped two visualization tools with Gradio to see how different models behave when things get emotionally intense:
- AI-companionship/intima-responses-2D
- giadap/INTIMA-responses
sergiopaniegoย 
posted an update 6 days ago
view post
Post
2531
We just released TRL v0.20 with major multimodal upgrades!

๐Ÿ‘๏ธ VLM support for GRPO (highly requested by the community!)
๐ŸŽž๏ธ New GSPO trainer (from @Qwen , released last week, VLM-ready)
๐Ÿ™ New MPO trainer (multimodal by design, as in the paper)

๐Ÿ“ Full release notes here: https://github.com/huggingface/trl/releases/tag/v0.20.0
yjerniteย 
posted an update 7 days ago
view post
Post
3926
๐—™๐—ถ๐—ฟ๐˜€๐˜ ๐—š๐—ฃ๐—”๐—œ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐—˜๐—จ ๐——๐—ฎ๐˜๐—ฎ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ฝ๐—ฎ๐—ฟ๐—ฒ๐—ป๐—ฐ๐˜† ๐—ง๐—ฒ๐—บ๐—ฝ๐—น๐—ฎ๐˜๐—ฒ? ๐Ÿ‡ช๐Ÿ‡บ

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! ๐Ÿ“Š๐Ÿ“š)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd ๐Ÿ‘€


In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month ๐Ÿค— ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too ๐Ÿ’ก)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations ๐Ÿ™Œ Definitely a step forward for transparency ๐Ÿ”

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency