Quantization-Aware Model

Aqui-open0-2.5

The first quantization-aware model from Aqui Solutions, built on Qwen2.5 architecture with extended thinking capabilities. Delivering exceptional performance with ultra-low VRAM usage through native 8-bit optimization.

🧠 Extended Thinking
⚑ 8-Bit Native
πŸ”“ MIT Licensed
πŸ’Ύ Low VRAM

open0-2.5-32B

Revolutionary quantization-aware model based on Qwen2.5-32B with extended thinking capabilities, optimized for 8-bit inference from the ground up.

🧠 32B parameters
⚑ 8-bit quantized
πŸ’Ύ 30.4 GiB VRAM
🎯 Extended thinking
View Model

πŸš€ Breakthrough in Efficiency

First Quantization-Aware Model β€” Unlike traditional post-training quantization, our model was designed and trained with 8-bit precision in mind, delivering superior performance with dramatically reduced memory requirements.


Benchmark Performance

All evaluations performed in 8-bit quantization for open0-2.5 and full precision for others.

Benchmark Aqui-open0-2.5 32B Qwen3 2507 235B DeepSeek V3.1 Think 685B GLM-4.5 358B EXAONE 4.0 32B KAT-V1-40B Hermes 4 405B
MMLU-Pro84.184.385.183.581.878.980.5
GPQA Diamond78.279.077.978.273.972.570.5
Humanity's Last Exam16.715.013.012.210.57.89.7
LiveCodeBench72.478.878.473.874.769.561.3
AIME 202586.991.089.773.780.081.578.1
Artificial Analysis Intelligence Index54.7757.4753.9549.4442.6443.6741.57

VRAM Efficiency Comparison

Model VRAM Usage (GiB) Parameters
Aqui-open0-2.5 32B30.432B
Qwen3 2507 235B41.0235B
DeepSeek V3.1 Think 685B59.6685B
GLM-4.5 358B59.6358B
EXAONE 4.0 32B68.932B
KAT-V1-40B74.540B
Hermes 4 405B754.4405B

Key Features

🧠 Extended Thinking

Built upon Qwen2.5 architecture with enhanced reasoning capabilities through extended thinking mechanisms.

⚑ Quantization-Aware Training

First model from Aqui Solutions designed specifically for 8-bit inference, maintaining performance while drastically reducing memory usage.

πŸ’Ύ Ultra-Low VRAM

Runs efficiently on consumer hardware with only 30.4 GiB VRAM requirement, making advanced AI accessible to more users.

πŸ”“ MIT Licensed

Complete freedom for commercial use, modification, and redistribution with minimal restrictions.


Usage

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
# Load the model and tokenizer in 8-bit
tokenizer = AutoTokenizer.from_pretrained("aquigpt/open0-2.5")
model = AutoModelForCausalLM.from_pretrained(
    "aquigpt/open0-2.5", 
    load_in_8bit=True,
    device_map="auto"
)

# Generate text
inputs = tokenizer("Solve this complex reasoning problem:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

The open0-2.5 model was built upon Qwen2.5-32B with significant enhancements:

  • Extended thinking capabilities through architectural modifications
  • Quantization-aware training from initialization
  • Advanced fine-tuning on reasoning and mathematical datasets
  • Optimized for 8-bit inference without performance degradation
  • Constitutional AI alignment for safe and helpful responses
Note: This model represents a breakthrough in efficient AI deployment. All benchmark results were obtained using 8-bit quantization, demonstrating the effectiveness of our quantization-aware training approach.

Built with ❀️ by Aqui Solutions β€’ MIT β€’ September 2025

Downloads last month
7
Safetensors
Model size
32.8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including aquigpt/open0-2.5