Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

codelion

posted an update 1 day ago

Post

3702

I wanted to share a technique that's been working really well for recovering performance after INT4 quantization.

Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses.

Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47).

We saw similar results on Qwen3-0.6B:

Perplexity: 2.40 → 2.09 (only 5.7% degradation from FP16 baseline)
Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction)
Speed: 3.0x faster inference than FP16
Quality: Generates correct, optimized code solutions

- Pre-trained adapter: codelion/Qwen3-0.6B-accuracy-recovery-lora
- GitHub repo: https://github.com/codelion/ellora

Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization.

Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!

AdinaY

posted an update 3 days ago

Post

4638

MiniCPM-V 4.5 🚀 New MLLM for image, multi-image & video understanding, running even on your phone, released by OpenBMB

openbmb/MiniCPM-V-4_5

✨ SOTA vision language capability
✨ 96× video token compression > high-FPS & long video reasoning
✨ Switchable fast vs deep thinking modes
✨ Strong OCR, document parsing, supports 30+ languages

ginipick

posted an update 1 day ago

Post

2687

🍌 Nano Banana + Video: AI Image Style Transfer & Video Generation Tool

🎨 Key Features
1️⃣ Image Style Transfer

ginigen/Nano-Banana-Video

📸 Upload up to 2 images for style fusion
✨ High-quality image generation with Google Nano Banana model
🎭 Apply desired styles with text prompts

2️⃣ Video Generation

🎬 Convert generated images to videos
📐 Maintain original aspect ratio option
⏱️ Adjustable duration (1-4 seconds)

🚀 How to Use
Step-by-Step Guide
Step 1: Image Generation 🖼️

Enter style description
Upload 1-2 images (optional)
Click "Generate Magic ✨"

Step 2: Video Creation 📹

Send generated image to video tab
Set animation style
Generate video!

💡 Use Cases

🏞️ Transform landscape photos into artistic masterpieces
🤖 Bring static images to life
🎨 Mix styles from two different images
📱 Create short videos for social media

⚡ Tech Stack
Google Nano Banana Stable Video Diffusion Gradio Replicate API

#AIVideoGenerator #ImageToVideoConverter #StyleTransferAI #GoogleNanoBanana #StableVideoDiffusion #AIAnimationTool #TextToVideo #ImageAnimationSoftware #AIArtGenerator #VideoCreationTool #MachineLearningVideo #DeepLearningAnimation #HuggingFaceSpaces #ReplicateAPI #GradioApplication #ZeroGPUComputing #AIStyleMixing #AutomatedVideoProduction #NeuralStyleTransfer #AIPoweredCreativity

dhruv3006

posted an update 2 days ago

Post

892

Pair a vision grounding model with a reasoning LLM with Cua

Cua just shipped v0.4 of the Cua Agent framework with Composite Agents - you can now pair a vision/grounding model with a reasoning LLM using a simple modelA+modelB syntax. Best clicks + best plans.

The problem: every GUI model speaks a different dialect.
• some want pixel coordinates
• others want percentages
• a few spit out cursed tokens like <|loc095|>

We built a universal interface that works the same across Anthropic, OpenAI, Hugging Face, etc.:

agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)

But here’s the fun part: you can combine models by specialization.
Grounding model (sees + clicks) + Planning model (reasons + decides) →

agent = ComputerAgent(
model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-4o",
tools=[computer]
)

This gives GUI skills to models that were never built for computer use. One handles the eyes/hands, the other the brain. Think driver + navigator working together.

Two specialists beat one generalist. We’ve got a ready-to-run notebook demo - curious what combos you all will try.

Github : https://github.com/trycua/cua

Blog : https://www.trycua.com/blog/composite-agents

openfree

posted an update 2 days ago

Post

4690

🍌 Nano Banana: Google AI Completely Free!

🎉 Finally, Google's Nano Banana AI is available for everyone - absolutely FREE!

🎯 Choose Your Perfect Version!
🌟 Free Nano Banana - For Everyone
Transform images with AI - It's that simple!

🚀 Start in 3 Seconds
1️⃣ Click Here 2️⃣ Upload Image 3️⃣ Enter Style → Done! ✨
No Sign-up ❌ | No Payment ❌ | No Ads ❌ | Just Free ⭕

📸 Simple drag & drop upload
✏️ Describe styles in any language
⚡ Results in under 30 seconds
🎨 Perfect for SNS, blogs, presentations

👉 Start Now: openfree/Free-Nano-Banana
🔍 Nano Banana Upscale - For Designers
Professional high-resolution output when you need it!

🖼️ 4x resolution upscaling (Real-ESRGAN)
🎯 Optimized for print & large displays
💎 Premium quality with preserved details
📐 Professional quality without Photoshop

👉 Create in HD: openfree/Nano-Banana-Upscale
💻 Nano Banana API - For Developers
Power your app with AI!

🔧 Instant RESTful API integration
📦 Python, JS, Java code examples included
⚙️ Batch processing & automation support
🚀 Unlimited usage with free API key

👉 Get API Access: aiqtech/Nano-Banana-API
🔗 Powered by Google's Official Model via Replicate API!
📌 100% Transparent Open Source
✨ We've integrated directly with Google's official Nano Banana model through Replicate API!

🔓 Full source code available on GitHub
📝 Complete Gradio interface implementation
🛠️ Detailed Replicate integration documentation
🎯 Fork and create your own version anytime

🚀 Start Your Journey Today!
Democratizing AI Technology - Built Together by the Community 💜
Made with ❤️ by Openfree AI Community
All code is open source. Let's grow together!

1 reply

codelion

posted an update 3 days ago

Post

4844

I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering.

Logic puzzle accuracy: 61% → 84%. 3 hours training on single GPU. 🧠

Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent.

🔗 Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb

🤗 Model: codelion/gemma-3-1b-it-reasoning-grpo-lora

💻 Code: https://github.com/codelion/ellora

1 reply

takarajordan

posted an update about 23 hours ago

Post

880

I'm currently looking into what makes a scientific paper more popular than others on a platform like Hugging Face. I conducted a huge array of tests, content length, time based information even semantic feature extraction to get to some sort of answer around...

What actually drives popularity of these papers, why do some papers get zero upvotes and why do some get thousands?

The answer is absolutely nothing. Yes that's right. Nothing about the actual paper itself drives popularity, the paper's popularity is driven by external factors like it's authors, external marketing and others.

So next time you see a research paper with a lot of upvotes, just remember it's not because of the efforts of the authors. Remain objective.

tsungyi

posted an update 1 day ago

Post

1260

Cosmos Reason just topped Physical Reasoning Leaderboard on Hugging Face. 👏🔥

Cosmos Reason is an open, customizable, commercial-ready 7B-parameter, reasoning vision language model (VLM) for physical AI and robotics. The VLM empowers robots and vision AI agents to reason like humans, leveraging prior knowledge, physics understanding, and common sense to understand and operate intelligently in the real world.

This model unlocks advanced capabilities for robotics, autonomous vehicles, and real-world operations—from cities to high-tech factories.

Key use cases include:
Data curation & annotation: Automate high-quality dataset curation and annotation at scale.
Robot planning & reasoning: Serve as the "brain" for deliberate, methodical decision-making with vision language action (VLA) models.
Video analytics AI agents: Extract actionable insights and perform root-cause analysis on massive video datasets.

Ready to build the next generation of physical AI? Get started 👉 nvidia/Cosmos-Reason1-7B
Try the preview here: https://build.nvidia.com/nvidia/cosmos-reason1-7b

MonsterMMORPG

posted an update 2 days ago

Post

924

Nano Banana (Gemini 2.5 Flash Image) Full Tutorial — 27 Unique Cases vs Qwen Image Edit — Free 2 Use : https://youtu.be/qPUreQxB8zQ

Tutorial link : https://youtu.be/qPUreQxB8zQ

Nano Banana AI image editing model was published by Google today. It is officially named the Google Gemini 2.5 Flash Image model. It is the most advanced zero-shot image editing model ever made. I have conducted a thorough, in-depth review of this model with 27 unique cases. All prompts, images used, and results are demonstrated in real-time—live in this tutorial. Moreover, I have compared each result with the state-of-the-art (SOTA) best open-source, locally available, and free-to-use Qwen Image Edit model, so we can see which model performs better at which tasks.

Video Chapters

0:00 Introduction to Google's "Nano Banana" (Gemini 2.5 Flash)
0:28 Comparing Gemini vs. Qwen Image Edit Model (27 Test Cases)
1:33 Solving Gemini's Low Resolution with SUPIR Upscaling
2:28 Teaser: Upcoming Qwen Image LoRA Training Application
2:41 How to Access Gemini 2.5 Flash in Google AI Studio
2:55 Test Case 1: Text Conversion
3:31 Test Case 2: Photorealism Test (Portrait)
4:36 Test Case 3: Adding Sunglasses
5:44 Test Case 4: Adding Iron Man to a Surfer (Gemini Wins)
6:38 Test Case 5: Adding a Cat (Qwen Wins)
7:20 Test Case 6: Clothing Extraction (Gemini Fails)
8:02 Test Case 7: Character Back View (Qwen Wins on Accuracy)
9:24 Test Case 8: Photo to Anime Style (Gemini Wins on Resemblance)
10:18 Test Case 9: Changing Background to Night
11:37 Test Case 10: Outpainting a Portrait (Qwen Wins on Proportions)
13:22 Test Case 11: Adding a Lion to a Scene (Gemini Wins)
13:59 Test Cases 12 & 13: Stylization Failures (Pixel Art & Claymation)
15:44 Test Case 14: Adding a Knight's Helmet
16:47 Test Case 15: Adding Reflections (Qwen is More Accurate)
18:00 Test Case 16: Changing Day to Night (Window View)
19:33 Test Case 17: Adding a Wooden Sign
20:22 Test Case 18: Old Photo Restoration

1 reply

merve

posted an update 3 days ago

Post

5555

first vision language model built off openai/gpt-oss-20b just dropped! 🔥

InternVL3.5 comes with 32 models 🤯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ⤵️

1 reply

Recently active users