Quantization-Aware Model

Aqui-open0-2.5

The first quantization-aware model from Aqui Solutions, built on Qwen2.5 architecture with extended thinking capabilities. Delivering exceptional performance with ultra-low VRAM usage through native 8-bit optimization.

🧠 Extended Thinking

⚡ 8-Bit Native

🔓 MIT Licensed

💾 Low VRAM

open0-2.5-32B

Revolutionary quantization-aware model based on Qwen2.5-32B with extended thinking capabilities, optimized for 8-bit inference from the ground up.

🧠 32B parameters

⚡ 8-bit quantized

💾 30.4 GiB VRAM

🎯 Extended thinking

View Model

🚀 Breakthrough in Efficiency

First Quantization-Aware Model — Unlike traditional post-training quantization, our model was designed and trained with 8-bit precision in mind, delivering superior performance with dramatically reduced memory requirements.

Benchmark Performance

All evaluations performed in 8-bit quantization for open0-2.5 and full precision for others.

Benchmark	Aqui-open0-2.5 32B	Qwen3 2507 235B	DeepSeek V3.1 Think 685B	GLM-4.5 358B	EXAONE 4.0 32B	KAT-V1-40B	Hermes 4 405B
MMLU-Pro	84.1	84.3	85.1	83.5	81.8	78.9	80.5
GPQA Diamond	78.2	79.0	77.9	78.2	73.9	72.5	70.5
Humanity's Last Exam	16.7	15.0	13.0	12.2	10.5	7.8	9.7
LiveCodeBench	72.4	78.8	78.4	73.8	74.7	69.5	61.3
AIME 2025	86.9	91.0	89.7	73.7	80.0	81.5	78.1
Artificial Analysis Intelligence Index	54.77	57.47	53.95	49.44	42.64	43.67	41.57

VRAM Efficiency Comparison

Model	VRAM Usage (GiB)	Parameters
Aqui-open0-2.5 32B	30.4	32B
Qwen3 2507 235B	41.0	235B
DeepSeek V3.1 Think 685B	59.6	685B
GLM-4.5 358B	59.6	358B
EXAONE 4.0 32B	68.9	32B
KAT-V1-40B	74.5	40B
Hermes 4 405B	754.4	405B

Key Features

🧠 Extended Thinking

Built upon Qwen2.5 architecture with enhanced reasoning capabilities through extended thinking mechanisms.

⚡ Quantization-Aware Training

First model from Aqui Solutions designed specifically for 8-bit inference, maintaining performance while drastically reducing memory usage.

💾 Ultra-Low VRAM

Runs efficiently on consumer hardware with only 30.4 GiB VRAM requirement, making advanced AI accessible to more users.

🔓 MIT Licensed

Complete freedom for commercial use, modification, and redistribution with minimal restrictions.

Usage

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
# Load the model and tokenizer in 8-bit
tokenizer = AutoTokenizer.from_pretrained("aquigpt/open0-2.5")
model = AutoModelForCausalLM.from_pretrained(
    "aquigpt/open0-2.5", 
    load_in_8bit=True,
    device_map="auto"
)

# Generate text
inputs = tokenizer("Solve this complex reasoning problem:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

The open0-2.5 model was built upon Qwen2.5-32B with significant enhancements:

Extended thinking capabilities through architectural modifications
Quantization-aware training from initialization
Advanced fine-tuning on reasoning and mathematical datasets
Optimized for 8-bit inference without performance degradation
Constitutional AI alignment for safe and helpful responses

Note: This model represents a breakthrough in efficient AI deployment. All benchmark results were obtained using 8-bit quantization, demonstrating the effectiveness of our quantization-aware training approach.

Built with ❤️ by Aqui Solutions • MIT • September 2025

Downloads last month: 7

Safetensors

Model size

32.8B params

Tensor type

BF16

Collection including aquigpt/open0-2.5

open0-2 family

Collection

The second generation of flagship open weights models from Aqui, with 2B and 21B parameters. • 5 items • Updated 16 days ago