Aqui-VL 12B Qwistral
Aqui-VL 12B Qwistral is an advanced thinking-enabled language model based on Mistral NeMo 12B, distilled from Qwen3 235B. This model from Aqui Solutions, creators of AquiGPT, features built-in reasoning capabilities similar to Gemini 2.5 Pro and OpenAI o3. With 12 billion parameters, it delivers exceptional performance while remaining accessible on consumer-grade hardware.
Key Features
- Thinking Mode by Default: Built-in reasoning capabilities for complex problem solving
- Consumer Hardware Compatible: Runs efficiently on RTX 4090 or 24GB Mac
- 32K Context Window: Handle long documents and complex conversations
- Strong Mathematical Reasoning: Exceptional performance on mathematical and logical tasks
- Distilled from Qwen3: Benefits from knowledge transfer from 235B parameter model
Hardware Requirements
Minimum Requirements
- GPU: RTX 4090 (24GB VRAM) or equivalent
- Mac: 24GB unified memory (Apple Silicon recommended)
- RAM: 24GB system memory (for GPU setups)
- Storage: 15GB available space (for model and overhead)
Recommended Setup
- GPU: RTX 4090 with adequate cooling
- CPU: Modern multi-core processor
- RAM: 32GB+ for optimal performance
- Storage: NVMe SSD for faster model loading
Performance Benchmarks
Aqui-VL 12B Qwistral demonstrates competitive performance across multiple domains in thinking mode:
Benchmark | Aqui-VL 12B Qwistral | Mistral Small 3.2 24B | Aqui-VL 24B Mistral | Qwen3 32B | Gemini 2.5 Flash-Lite | Llama 4 Maverick 400B |
---|---|---|---|---|---|---|
MMLU-Pro | 69.0% | 68.1% | 69.0% | 72.7% | 72.4% | 80.9% |
AIME 2024 | 56.0% | 32.3% | 32.0% | 30.3% | 50.0% | 39.0% |
LiveCodeBench | 43.0% | 27.5% | 28.0% | 28.8% | 40.0% | 39.7% |
GPQA Diamond | 57.0% | 50.5% | 51.0% | 53.5% | 47.4% | 67.1% |
Humanity's Last Exam | 9.3% | 4.3% | 4.3% | 4.3% | 3.7% | 4.8% |
Average | 46.9% | 36.5% | 36.9% | 37.9% | 42.7% | 46.3% |
Comparison with Base Model (Mistral NeMo 12B)
Benchmark | Aqui-VL 12B Qwistral | Mistral NeMo 12B | Improvement |
---|---|---|---|
MMLU-Pro | 69.0% | 39.9% | +29.1% |
AIME 2024 | 56.0% | 0.0% | +56.0% |
LiveCodeBench | 43.0% | 5.7% | +37.3% |
GPQA Diamond | 57.0% | 31.4% | +25.6% |
Humanity's Last Exam | 9.3% | 4.4% | +4.9% |
Average | 46.9% | 16.28% | +30.6% |
Model Specifications
- Parameters: 12 billion
- Context Window: 32,000 tokens
- Knowledge Cutoff: October 2024 (Qwen3) + December 2023 (Mistral NeMo)
- Architecture: mistral (transformer-based)
- Languages: Multilingual support with strong English, French and Portuguese performance
- Thinking Mode: Default behavior with built-in reasoning
Installation & Usage
Quick Start with Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "aquigpt/aqui-vl-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Generate text with thinking (default behavior)
prompt = "Solve this math problem: If a train travels 120 miles in 2 hours, what is its speed?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=500, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using with Ollama
# Pull the model (coming soon)
ollama pull aquiffoo/aqui-vl-12b
# Run interactive chat
ollama run aquiffoo/aqui-vl-12b
Thinking Mode Controls
- Default: Thinking mode enabled (model reasons before answering)
- Enable thinking: Use
/think
command - Disable thinking: Use
/no_think
command
Use Cases
Mathematical & Logical Reasoning
Exceptional performance on complex problems:
- Advanced mathematics (AIME 2024: 56.0%)
- Scientific reasoning (GPQA Diamond: 57.0%)
- Complex problem-solving with step-by-step thinking
Code Generation & Programming
Strong coding capabilities:
- Algorithm implementation and optimization
- Code debugging and review
- Technical documentation
- Live coding challenges
General Assistance
- Research and information synthesis
- Creative writing and content generation
- Multilingual translation and communication
- Educational tutoring with detailed explanations
Quantization
Aqui-VL 12B Qwistral will be available in multiple quantization formats:
- Full Precision: ~24GB VRAM usage
- Q4_K_M: ~8GB (optimized for consumer hardware)
- Q8_0: ~13GB (high quality retention)
Fine-tuning & Customization
Aqui-VL 12B Qwistral supports:
- Parameter-efficient fine-tuning (LoRA, QLoRA)
- Full fine-tuning for specialized domains
- Custom tokenizer training
- Thinking pattern customization
Limitations
- Knowledge cutoff varies by component (Oct 2024 for Qwen3, Dec 2023 for Mistral NeMo)
- May occasionally produce hallucinations
- Thinking mode increases response time and token usage
- Requires significant computational resources for optimal performance
License
This model is released under the Apache 2.0 License, making it suitable for both research and commercial applications.
Support
For questions and support regarding Aqui-VL 12B Qwistral, please visit the Hugging Face repository and use the community discussions section.
Acknowledgments
Built upon Mistral NeMo 12B by Mistral AI and distilled from Qwen3 235B by Alibaba Cloud. Special thanks to the open-source community for tools and datasets that made this model possible.
Copyright 2025 Aqui Solutions. All rights reserved
- Downloads last month
- 4
Model tree for aquigpt/aqui-vl-12b
Base model
mistralai/Mistral-Nemo-Base-2407