Upload folder using huggingface_hub

4bd3201 verified 17 days ago

5.61 kB

	---
	license: mit
	tags:
	- RLinf
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- gen-robot/openvla-7b-rlvla-warmup
	pipeline_tag: reinforcement-learning
	model-index:
	- name: RLinf-openvla-maniskill3-ppo
	results:
	- task:
	type: VLA
	dataset:
	type: maniskill-vision
	name: maniskill-vision
	metrics:
	- type: accuracy
	value: 82.0
	- task:
	type: VLA
	dataset:
	type: maniskill-semantic
	name: maniskill-semantic
	metrics:
	- type: accuracy
	value: 80.6
	- task:
	type: VLA
	dataset:
	type: maniskill-position
	name: maniskill-position
	metrics:
	- type: accuracy
	value: 89.3
	---

	<div align="center">
	<img src="logo.svg" alt="RLinf-logo" width="500"/>
	</div>


	<div align="center">
	<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
	<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
	<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
	<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
	<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
	<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a> -->
	</div>

	<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>

	[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.


	<div align="center">
	<img src="overview.png" alt="RLinf-overview" width="600"/>
	</div>

	## Model Description
	This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Proximal Policy Optimization (PPO) on the ManiSkill simulator.

	## Full OOD Evaluation and Results
	### Overall OOD Eval Results
	Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
	\| Description \| rl4vla \| GRPO-openvlaoft \| PPO-openvlaoft \| __PPO-openvla__ \| GRPO-openvla \|
	\|---------------\|-----------\|-----------------\|----------------\|-------------\|---------------\|
	\| Avg results \| 0.7608 \| 0.61484375 \| 0.6453125 \| 0.822135417 \| 0.7546875 \|
	### OOD Eval on Vision

	\| Description \| rl4vla \| GRPO-openvlaoft \| PPO-openvlaoft \| __PPO-openvla__ \| GRPO-openvla \|
	\|---------------\|-----------\|-----------------\|----------------\|-------------\|---------------\|
	\| vision avg \| 0.7656 \| 0.846875 \| 0.80546875 \| 0.8203125 \| 0.746875 \|
	\| unseen table \| 0.844 \| 0.9140625 \| 0.9453125 \| 0.95703125 \| 0.8984375 \|
	\| dynamic texture (weak) \| 0.833 \| 0.91015625 \| 0.82421875 \| 0.85546875 \| 0.7890625 \|
	\| dynamic texture (strong) \| 0.63 \| 0.7734375 \| 0.625 \| 0.72265625 \| 0.65625 \|
	\| dynamic noise (weak) \| 0.854 \| 0.89453125 \| 0.8984375 \| 0.87109375 \| 0.796875\|
	\| dynamic noise (strong) \| 0.667 \| 0.7421875 \| 0.734375 \| 0.6953125 \| 0.59375\|

	### OOD Eval on Semantic
	\| Description \| rl4vla \| GRPO-openvlaoft \| PPO-openvlaoft \| __PPO-openvla__ \| GRPO-openvla \|
	\|---------------\|-----------\|-----------------\|----------------\|-------------\|---------------\|
	\| object avg \| 0.754 \| 0.516113281 \| 0.56640625 \| 0.805664063 \| 0.744140625\|
	\| train setting \| 0.938 \| 0.94140625 \| 0.91796875 \| 0.9609375 \| 0.84375\|
	\| unseen objects \| 0.714 \| 0.8046875 \| 0.77734375 \| 0.81640625 \| 0.765625\|
	\| unseen receptacles \| 0.75 \| 0.7421875 \| 0.78125 \| 0.8125 \| 0.734375\|
	\| unseen instructions \| 0.891 \| 0.6796875 \| 0.68359375 \| 0.9453125 \| 0.890625\|
	\| multi-object (both seen) \| 0.75 \| 0.3515625 \| 0.4296875 \| 0.84375 \| 0.7578125\|
	\| multi-object (both unseen) \| 0.578 \| 0.3046875 \| 0.38671875 \| 0.62890625 \| 0.578125\|
	\| distractive receptacle \| 0.812 \| 0.1875 \| 0.31640625 \| 0.828125 \| 0.78125\|
	\| multi-receptacle (both unseen) \| 0.599 \| 0.1171875 \| 0.23828125 \| 0.609375 \| 0.6015625\|

	### OOD Eval on Position
	\| Description \| rl4vla \| GRPO-openvlaoft \| PPO-openvlaoft \| __PPO-openvla__ \| GRPO-openvla \|
	\|---------------\|-----------\|-----------------\|----------------\|-------------\|---------------\|
	\| position avg \| 0.776 \| 0.4296875 \| 0.560546875 \| 0.892578125 \| 0.81640625\|
	\| unseen position (object & receptacle) \| 0.807 \| 0.40234375 \| 0.50390625 \| 0.86328125 \| 0.75\|
	\| mid-episode object reposition \| 0.745 \| 0.45703125 \| 0.6171875 \| 0.921875 \| 0.8828125\|

	## How to Use
	Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_ppo_openvla.yaml``:

	- Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.

	Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.

	## License
	This code repository and the model weights are licensed under the MIT License.