vllm
aquiffoo commited on
Commit
48919c0
·
verified ·
1 Parent(s): 59d0f48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +193 -3
README.md CHANGED
@@ -1,3 +1,193 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ - es
8
+ - pt
9
+ - it
10
+ - ja
11
+ - ko
12
+ - ru
13
+ - zh
14
+ - ar
15
+ - fa
16
+ - id
17
+ - ms
18
+ - ne
19
+ - pl
20
+ - ro
21
+ - sr
22
+ - sv
23
+ - tr
24
+ - uk
25
+ - vi
26
+ - hi
27
+ - bn
28
+ base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
29
+ library_name: vllm
30
+ inference: false
31
+
32
+ ---
33
+
34
+ # Aqui-VL 24B Mistral
35
+
36
+ Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind [AquiGPT](https://aquigpt.com.br). With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts.
37
+
38
+ ## Key Features
39
+
40
+ - **Consumer Hardware Compatible**: Runs on single RTX 4090 or 32GB Mac
41
+ - **Multimodal Capabilities**: Text, vision, chart analysis, and document understanding
42
+ - **128K Context Window**: Handle long documents and complex conversations
43
+ - **Strong Instruction Following**: Significantly improved over base Mistral Small 3.1
44
+ - **Exceptional Code Generation**: Best-in-class coding performance
45
+
46
+ ## Hardware Requirements
47
+
48
+ ### Minimum Requirements
49
+ - **GPU**: RTX 4090 (24GB VRAM) or equivalent
50
+ - **Mac**: 32GB unified memory (Apple Silicon recommended)
51
+ - **RAM**: 32GB system memory (for GPU setups)
52
+ - **Storage**: 20GB available space (for model and overhead)
53
+
54
+ ### Recommended Setup
55
+ - **GPU**: RTX 4090 with adequate cooling
56
+ - **CPU**: Modern multi-core processor
57
+ - **RAM**: 64GB+ for optimal performance
58
+ - **Storage**: NVMe SSD for faster model loading
59
+
60
+ ## Performance Benchmarks
61
+
62
+ Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains:
63
+
64
+ | Benchmark | Aqui-VL 24B Mistral | Mistral Small 3.1 | Llama 3.1 70B |
65
+ |-----------|------------------|-------------------|----------------|
66
+ | **IFEval** (Instruction Following) | 65.3% | 55.6% | **87.5%** |
67
+ | **MMLU** (General Knowledge) | 80.9% | 80.5% | **86.0%** |
68
+ | **GPQA** (Science Q&A) | 44.7% | 44.4% | **46.7%** |
69
+ | **HumanEval** (Coding) | **92.5%** | 88.9% | 80.5% |
70
+ | **MATH** (Mathematics) | 69.3% | **69.5%** | 68.0% |
71
+ | **MMMU** (General Vision) | **64.0%** | 62.5% | N/A* |
72
+ | **ChartQA** (Chart Analysis) | **87.6%** | 86.2% | N/A* |
73
+ | **DocVQA** (Document Analysis) | **94.9%** | 94.1% | N/A* |
74
+ | **Average Text Performance** | 70.5% | 67.8% | **73.7%** |
75
+ | **Average Vision Performance** | **82.2%** | 80.9% | N/A* |
76
+
77
+ *Llama 3.1 70B does not include vision capabilities
78
+
79
+ ## Model Specifications
80
+
81
+ - **Parameters**: 23.6 billion
82
+ - **Context Window**: 128,000 tokens
83
+ - **Knowledge Cutoff**: December 2023
84
+ - **Architecture**: mistral (transformer-based with vision)
85
+ - **Languages**: Multilingual support with strong English, French and Portuguese performance
86
+
87
+ ## Installation & Usage
88
+
89
+ ### Quick Start with Transformers
90
+
91
+ ```python
92
+ from transformers import AutoTokenizer, AutoModelForCausalLM
93
+ import torch
94
+
95
+ # Load model and tokenizer
96
+ model_name = "aquigpt/aqui-vl-24b"
97
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
98
+ model = AutoModelForCausalLM.from_pretrained(
99
+ model_name,
100
+ torch_dtype=torch.float16,
101
+ device_map="auto"
102
+ )
103
+
104
+ # Generate text
105
+ prompt = "Explain quantum computing in simple terms:"
106
+ inputs = tokenizer(prompt, return_tensors="pt")
107
+ outputs = model.generate(**inputs, max_length=200, temperature=0.7)
108
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
109
+ print(response)
110
+ ```
111
+
112
+ ### Using with Ollama
113
+
114
+ ```bash
115
+ # Pull the model
116
+ ollama pull aquiffoo/aqui-vl-24b
117
+
118
+ # Run interactive chat
119
+ ollama run aquiffoo/aqui-vl-24b
120
+ ```
121
+
122
+ ### Using with llama.cpp
123
+
124
+ ```bash
125
+ # Download quantized model (Q4_K_M, 14.4GB)
126
+ wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf
127
+
128
+ # Run with llama.cpp
129
+ ./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100
130
+ ```
131
+
132
+ ## Use Cases
133
+
134
+ ### Code Generation & Programming
135
+ With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at:
136
+ - Writing clean, efficient code in multiple languages
137
+ - Debugging and code review
138
+ - Algorithm implementation
139
+ - Technical documentation
140
+
141
+ ### Document & Chart Analysis
142
+ Strong vision capabilities enable:
143
+ - PDF document analysis and Q&A
144
+ - Chart and graph interpretation
145
+ - Scientific paper comprehension
146
+ - Business report analysis
147
+
148
+ ### General Assistance
149
+ - Research and information synthesis
150
+ - Creative writing and content generation
151
+ - Mathematical problem solving
152
+ - Multilingual translation and communication
153
+
154
+ ## Quantization
155
+
156
+ Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility:
157
+
158
+ - **Format**: Q4_K_M quantization
159
+ - **Size**: 14.4GB
160
+ - **VRAM Usage**: ~16GB (with overhead)
161
+ - **Compatible with**: RTX 4090, 32GB Mac, and similar hardware
162
+ - **Performance**: Excellent quality retention with 4-bit quantization
163
+
164
+ ## Fine-tuning & Customization
165
+
166
+ Aqui-VL 24B Mistral supports:
167
+ - Parameter-efficient fine-tuning (LoRA, QLoRA)
168
+ - Full fine-tuning for specialized domains
169
+ - Custom tokenizer training
170
+ - Multi-modal fine-tuning for specific vision tasks
171
+
172
+ ## Limitations
173
+
174
+ - Knowledge cutoff at December 2023
175
+ - May occasionally produce hallucinations
176
+ - Performance varies with quantization level
177
+ - Requires significant computational resources for optimal performance
178
+
179
+ ## License
180
+
181
+ This model is released under the [Apache 2.0 License](LICENSE), making it suitable for both research and commercial applications.
182
+
183
+ ## Support
184
+
185
+ For questions and support regarding Aqui-VL 24B Mistral, please visit the [Hugging Face repository](https://huggingface.co/aquigpt/aqui-vl-24b) and use the community discussions section.
186
+
187
+ ## Acknowledgments
188
+
189
+ Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible.
190
+
191
+ ---
192
+
193
+ *Copyright 2025 Aqui Solutions. All rights reserved*