Aqui-open0-2.5
The first quantization-aware model from Aqui Solutions, built on Qwen2.5 architecture with extended thinking capabilities. Delivering exceptional performance with ultra-low VRAM usage through native 8-bit optimization.
open0-2.5-32B
Revolutionary quantization-aware model based on Qwen2.5-32B with extended thinking capabilities, optimized for 8-bit inference from the ground up.
π Breakthrough in Efficiency
First Quantization-Aware Model β Unlike traditional post-training quantization, our model was designed and trained with 8-bit precision in mind, delivering superior performance with dramatically reduced memory requirements.
Benchmark Performance
All evaluations performed in 8-bit quantization for open0-2.5 and full precision for others.
Benchmark | Aqui-open0-2.5 32B | Qwen3 2507 235B | DeepSeek V3.1 Think 685B | GLM-4.5 358B | EXAONE 4.0 32B | KAT-V1-40B | Hermes 4 405B |
---|---|---|---|---|---|---|---|
MMLU-Pro | 84.1 | 84.3 | 85.1 | 83.5 | 81.8 | 78.9 | 80.5 |
GPQA Diamond | 78.2 | 79.0 | 77.9 | 78.2 | 73.9 | 72.5 | 70.5 |
Humanity's Last Exam | 16.7 | 15.0 | 13.0 | 12.2 | 10.5 | 7.8 | 9.7 |
LiveCodeBench | 72.4 | 78.8 | 78.4 | 73.8 | 74.7 | 69.5 | 61.3 |
AIME 2025 | 86.9 | 91.0 | 89.7 | 73.7 | 80.0 | 81.5 | 78.1 |
Artificial Analysis Intelligence Index | 54.77 | 57.47 | 53.95 | 49.44 | 42.64 | 43.67 | 41.57 |
VRAM Efficiency Comparison
Model | VRAM Usage (GiB) | Parameters |
---|---|---|
Aqui-open0-2.5 32B | 30.4 | 32B |
Qwen3 2507 235B | 41.0 | 235B |
DeepSeek V3.1 Think 685B | 59.6 | 685B |
GLM-4.5 358B | 59.6 | 358B |
EXAONE 4.0 32B | 68.9 | 32B |
KAT-V1-40B | 74.5 | 40B |
Hermes 4 405B | 754.4 | 405B |
Key Features
π§ Extended Thinking
Built upon Qwen2.5 architecture with enhanced reasoning capabilities through extended thinking mechanisms.
β‘ Quantization-Aware Training
First model from Aqui Solutions designed specifically for 8-bit inference, maintaining performance while drastically reducing memory usage.
πΎ Ultra-Low VRAM
Runs efficiently on consumer hardware with only 30.4 GiB VRAM requirement, making advanced AI accessible to more users.
π MIT Licensed
Complete freedom for commercial use, modification, and redistribution with minimal restrictions.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM import torch# Load the model and tokenizer in 8-bit tokenizer = AutoTokenizer.from_pretrained("aquigpt/open0-2.5") model = AutoModelForCausalLM.from_pretrained( "aquigpt/open0-2.5", load_in_8bit=True, device_map="auto" ) # Generate text inputs = tokenizer("Solve this complex reasoning problem:", return_tensors="pt") outputs = model.generate(**inputs, max_length=512, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
The open0-2.5 model was built upon Qwen2.5-32B with significant enhancements:
- Extended thinking capabilities through architectural modifications
- Quantization-aware training from initialization
- Advanced fine-tuning on reasoning and mathematical datasets
- Optimized for 8-bit inference without performance degradation
- Constitutional AI alignment for safe and helpful responses
Note: This model represents a breakthrough in efficient AI deployment. All benchmark results were obtained using 8-bit quantization, demonstrating the effectiveness of our quantization-aware training approach.
Built with β€οΈ by Aqui Solutions β’ MIT β’ September 2025
- Downloads last month
- 7