Aqui-VL 12B Qwistral

Aqui-VL 12B Qwistral is an advanced thinking-enabled language model based on Mistral NeMo 12B, distilled from Qwen3 235B. This model from Aqui Solutions, creators of AquiGPT, features built-in reasoning capabilities similar to Gemini 2.5 Pro and OpenAI o3. With 12 billion parameters, it delivers exceptional performance while remaining accessible on consumer-grade hardware.

Key Features

Thinking Mode by Default: Built-in reasoning capabilities for complex problem solving
Consumer Hardware Compatible: Runs efficiently on RTX 4090 or 24GB Mac
32K Context Window: Handle long documents and complex conversations
Strong Mathematical Reasoning: Exceptional performance on mathematical and logical tasks
Distilled from Qwen3: Benefits from knowledge transfer from 235B parameter model

Hardware Requirements

Minimum Requirements

GPU: RTX 4090 (24GB VRAM) or equivalent
Mac: 24GB unified memory (Apple Silicon recommended)
RAM: 24GB system memory (for GPU setups)
Storage: 15GB available space (for model and overhead)

Recommended Setup

GPU: RTX 4090 with adequate cooling
CPU: Modern multi-core processor
RAM: 32GB+ for optimal performance
Storage: NVMe SSD for faster model loading

Performance Benchmarks

Aqui-VL 12B Qwistral demonstrates competitive performance across multiple domains in thinking mode: Performance Comparison Chart

Benchmark	Aqui-VL 12B Qwistral	Mistral Small 3.2 24B	Aqui-VL 24B Mistral	Qwen3 32B	Gemini 2.5 Flash-Lite	Llama 4 Maverick 400B
MMLU-Pro	69.0%	68.1%	69.0%	72.7%	72.4%	80.9%
AIME 2024	56.0%	32.3%	32.0%	30.3%	50.0%	39.0%
LiveCodeBench	43.0%	27.5%	28.0%	28.8%	40.0%	39.7%
GPQA Diamond	57.0%	50.5%	51.0%	53.5%	47.4%	67.1%
Humanity's Last Exam	9.3%	4.3%	4.3%	4.3%	3.7%	4.8%
Average	46.9%	36.5%	36.9%	37.9%	42.7%	46.3%

Comparison with Base Model (Mistral NeMo 12B)

Benchmark	Aqui-VL 12B Qwistral	Mistral NeMo 12B	Improvement
MMLU-Pro	69.0%	39.9%	+29.1%
AIME 2024	56.0%	0.0%	+56.0%
LiveCodeBench	43.0%	5.7%	+37.3%
GPQA Diamond	57.0%	31.4%	+25.6%
Humanity's Last Exam	9.3%	4.4%	+4.9%
Average	46.9%	16.28%	+30.6%

Model Specifications

Parameters: 12 billion
Context Window: 32,000 tokens
Knowledge Cutoff: October 2024 (Qwen3) + December 2023 (Mistral NeMo)
Architecture: mistral (transformer-based)
Languages: Multilingual support with strong English, French and Portuguese performance
Thinking Mode: Default behavior with built-in reasoning

Installation & Usage

Quick Start with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/aqui-vl-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text with thinking (default behavior)
prompt = "Solve this math problem: If a train travels 120 miles in 2 hours, what is its speed?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=500, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with Ollama

# Pull the model (coming soon)
ollama pull aquiffoo/aqui-vl-12b

# Run interactive chat
ollama run aquiffoo/aqui-vl-12b

Thinking Mode Controls

Default: Thinking mode enabled (model reasons before answering)
Enable thinking: Use /think command
Disable thinking: Use /no_think command

Use Cases

Mathematical & Logical Reasoning

Exceptional performance on complex problems:

Advanced mathematics (AIME 2024: 56.0%)
Scientific reasoning (GPQA Diamond: 57.0%)
Complex problem-solving with step-by-step thinking

Code Generation & Programming

Strong coding capabilities:

Algorithm implementation and optimization
Code debugging and review
Technical documentation
Live coding challenges

General Assistance

Research and information synthesis
Creative writing and content generation
Multilingual translation and communication
Educational tutoring with detailed explanations

Quantization

Aqui-VL 12B Qwistral will be available in multiple quantization formats:

Full Precision: ~24GB VRAM usage
Q4_K_M: ~8GB (optimized for consumer hardware)
Q8_0: ~13GB (high quality retention)

Fine-tuning & Customization

Aqui-VL 12B Qwistral supports:

Parameter-efficient fine-tuning (LoRA, QLoRA)
Full fine-tuning for specialized domains
Custom tokenizer training
Thinking pattern customization

Limitations

Knowledge cutoff varies by component (Oct 2024 for Qwen3, Dec 2023 for Mistral NeMo)
May occasionally produce hallucinations
Thinking mode increases response time and token usage
Requires significant computational resources for optimal performance

License

This model is released under the Apache 2.0 License, making it suitable for both research and commercial applications.

Support

For questions and support regarding Aqui-VL 12B Qwistral, please visit the Hugging Face repository and use the community discussions section.

Acknowledgments

Built upon Mistral NeMo 12B by Mistral AI and distilled from Qwen3 235B by Alibaba Cloud. Special thanks to the open-source community for tools and datasets that made this model possible.