Aqui-VL 12B Qwistral

Aqui-VL 12B Qwistral is an advanced thinking-enabled language model based on Mistral NeMo 12B, distilled from Qwen3 235B. This model from Aqui Solutions, creators of AquiGPT, features built-in reasoning capabilities similar to Gemini 2.5 Pro and OpenAI o3. With 12 billion parameters, it delivers exceptional performance while remaining accessible on consumer-grade hardware.

Key Features

  • Thinking Mode by Default: Built-in reasoning capabilities for complex problem solving
  • Consumer Hardware Compatible: Runs efficiently on RTX 4090 or 24GB Mac
  • 32K Context Window: Handle long documents and complex conversations
  • Strong Mathematical Reasoning: Exceptional performance on mathematical and logical tasks
  • Distilled from Qwen3: Benefits from knowledge transfer from 235B parameter model

Hardware Requirements

Minimum Requirements

  • GPU: RTX 4090 (24GB VRAM) or equivalent
  • Mac: 24GB unified memory (Apple Silicon recommended)
  • RAM: 24GB system memory (for GPU setups)
  • Storage: 15GB available space (for model and overhead)

Recommended Setup

  • GPU: RTX 4090 with adequate cooling
  • CPU: Modern multi-core processor
  • RAM: 32GB+ for optimal performance
  • Storage: NVMe SSD for faster model loading

Performance Benchmarks

Aqui-VL 12B Qwistral demonstrates competitive performance across multiple domains in thinking mode: Performance Comparison Chart

Benchmark Aqui-VL 12B Qwistral Mistral Small 3.2 24B Aqui-VL 24B Mistral Qwen3 32B Gemini 2.5 Flash-Lite Llama 4 Maverick 400B
MMLU-Pro 69.0% 68.1% 69.0% 72.7% 72.4% 80.9%
AIME 2024 56.0% 32.3% 32.0% 30.3% 50.0% 39.0%
LiveCodeBench 43.0% 27.5% 28.0% 28.8% 40.0% 39.7%
GPQA Diamond 57.0% 50.5% 51.0% 53.5% 47.4% 67.1%
Humanity's Last Exam 9.3% 4.3% 4.3% 4.3% 3.7% 4.8%
Average 46.9% 36.5% 36.9% 37.9% 42.7% 46.3%

Comparison with Base Model (Mistral NeMo 12B)

Benchmark Aqui-VL 12B Qwistral Mistral NeMo 12B Improvement
MMLU-Pro 69.0% 39.9% +29.1%
AIME 2024 56.0% 0.0% +56.0%
LiveCodeBench 43.0% 5.7% +37.3%
GPQA Diamond 57.0% 31.4% +25.6%
Humanity's Last Exam 9.3% 4.4% +4.9%
Average 46.9% 16.28% +30.6%

Model Specifications

  • Parameters: 12 billion
  • Context Window: 32,000 tokens
  • Knowledge Cutoff: October 2024 (Qwen3) + December 2023 (Mistral NeMo)
  • Architecture: mistral (transformer-based)
  • Languages: Multilingual support with strong English, French and Portuguese performance
  • Thinking Mode: Default behavior with built-in reasoning

Installation & Usage

Quick Start with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/aqui-vl-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text with thinking (default behavior)
prompt = "Solve this math problem: If a train travels 120 miles in 2 hours, what is its speed?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=500, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with Ollama

# Pull the model (coming soon)
ollama pull aquiffoo/aqui-vl-12b

# Run interactive chat
ollama run aquiffoo/aqui-vl-12b

Thinking Mode Controls

  • Default: Thinking mode enabled (model reasons before answering)
  • Enable thinking: Use /think command
  • Disable thinking: Use /no_think command

Use Cases

Mathematical & Logical Reasoning

Exceptional performance on complex problems:

  • Advanced mathematics (AIME 2024: 56.0%)
  • Scientific reasoning (GPQA Diamond: 57.0%)
  • Complex problem-solving with step-by-step thinking

Code Generation & Programming

Strong coding capabilities:

  • Algorithm implementation and optimization
  • Code debugging and review
  • Technical documentation
  • Live coding challenges

General Assistance

  • Research and information synthesis
  • Creative writing and content generation
  • Multilingual translation and communication
  • Educational tutoring with detailed explanations

Quantization

Aqui-VL 12B Qwistral will be available in multiple quantization formats:

  • Full Precision: ~24GB VRAM usage
  • Q4_K_M: ~8GB (optimized for consumer hardware)
  • Q8_0: ~13GB (high quality retention)

Fine-tuning & Customization

Aqui-VL 12B Qwistral supports:

  • Parameter-efficient fine-tuning (LoRA, QLoRA)
  • Full fine-tuning for specialized domains
  • Custom tokenizer training
  • Thinking pattern customization

Limitations

  • Knowledge cutoff varies by component (Oct 2024 for Qwen3, Dec 2023 for Mistral NeMo)
  • May occasionally produce hallucinations
  • Thinking mode increases response time and token usage
  • Requires significant computational resources for optimal performance

License

This model is released under the Apache 2.0 License, making it suitable for both research and commercial applications.

Support

For questions and support regarding Aqui-VL 12B Qwistral, please visit the Hugging Face repository and use the community discussions section.

Acknowledgments

Built upon Mistral NeMo 12B by Mistral AI and distilled from Qwen3 235B by Alibaba Cloud. Special thanks to the open-source community for tools and datasets that made this model possible.


Copyright 2025 Aqui Solutions. All rights reserved

Downloads last month
4
Safetensors
Model size
12.5B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquigpt/aqui-vl-12b

Collection including aquigpt/aqui-vl-12b