Papers
arxiv:2507.21802

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Published on Jul 29
· Submitted by tulvgengenr on Jul 31
Authors:
,
,
,
,
,

Abstract

MixGRPO, a novel framework integrating SDE and ODE, enhances flow matching models for image generation by optimizing only within a sliding window, improving efficiency and performance.

AI-generated summary

Although GRPO substantially enhances flow matching models in human preference alignment of image generation, methods such as FlowGRPO still exhibit inefficiency due to the necessity of sampling and optimizing over all denoising steps specified by the Markov Decision Process (MDP). In this paper, we propose MixGRPO, a novel framework that leverages the flexibility of mixed sampling strategies through the integration of stochastic differential equations (SDE) and ordinary differential equations (ODE). This streamlines the optimization process within the MDP to improve efficiency and boost performance. Specifically, MixGRPO introduces a sliding window mechanism, using SDE sampling and GRPO-guided optimization only within the window, while applying ODE sampling outside. This design confines sampling randomness to the time-steps within the window, thereby reducing the optimization overhead, and allowing for more focused gradient updates to accelerate convergence. Additionally, as time-steps beyond the sliding window are not involved in optimization, higher-order solvers are supported for sampling. So we present a faster variant, termed MixGRPO-Flash, which further improves training efficiency while achieving comparable performance. MixGRPO exhibits substantial gains across multiple dimensions of human preference alignment, outperforming DanceGRPO in both effectiveness and efficiency, with nearly 50% lower training time. Notably, MixGRPO-Flash further reduces training time by 71%. Codes and models are available at https://github.com/Tencent-Hunyuan/MixGRPO{MixGRPO}.

Community

Paper author Paper submitter
edited 4 days ago

Welcome to freely discuss Flow-GRPO, DanceGRPO, and MixGRPO👏
👀We will continue to update comparisons with Flow-GRPO, DPO, and other post-training technologies.
Could you please follow MixGRPO and kindly consider giving it a vote? 🙋‍♂️

image1.png
image3.png
image4.png
method1.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.21802 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.21802 in a Space README.md to link it from this page.

Collections including this paper 2