|
--- |
|
license: apache-2.0 |
|
tags: |
|
- text-generation |
|
- llama.cpp |
|
- gguf |
|
- quantized |
|
- q3_k_s |
|
model_type: llama |
|
inference: false |
|
base_model: |
|
- sarvamai/sarvam-m |
|
--- |
|
|
|
# sarvam-m-24b - Q3_K_S GGUF |
|
|
|
This repository contains the **Q3_K_S** quantized version of sarvam-m-24b in GGUF format. |
|
|
|
## Model Details |
|
- **Quantization**: Q3_K_S |
|
- **File Size**: ~9.7GB |
|
- **Description**: Small model with substantial quality loss |
|
- **Format**: GGUF (compatible with llama.cpp) |
|
|
|
## Usage |
|
|
|
### With llama.cpp |
|
```bash |
|
# Download the model |
|
huggingface-cli download tifin-india/sarvam-m-24b-q3_k_s-gguf |
|
|
|
# Run inference |
|
./main -m sarvam-m-24b-Q3_K_S.gguf -p "Your prompt here" |
|
``` |
|
|
|
### With Python (llama-cpp-python) |
|
```python |
|
from llama_cpp import Llama |
|
|
|
# Load the model |
|
llm = Llama( |
|
model_path="./sarvam-m-24b-Q3_K_S.gguf", |
|
n_ctx=2048, # Context length |
|
n_gpu_layers=35, # Adjust based on your GPU |
|
verbose=False |
|
) |
|
|
|
# Generate text |
|
response = llm("Your prompt here", max_tokens=100) |
|
print(response['choices'][0]['text']) |
|
``` |
|
|
|
### With Transformers + AutoGGUF |
|
```python |
|
from transformers import AutoTokenizer |
|
from auto_gptq import AutoGPTQForCausalLM |
|
|
|
model_name = "tifin-india/sarvam-m-24b-q3_k_s-gguf" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoGPTQForCausalLM.from_quantized(model_name) |
|
``` |
|
|
|
## Performance Characteristics |
|
|
|
| Aspect | Rating | |
|
|--------|--------| |
|
| **Speed** | ⭐⭐⭐⭐ | |
|
| **Quality** | ⭐⭐ | |
|
| **Memory** | ⭐⭐⭐⭐ | |
|
|
|
## Original Model |
|
|
|
This is a quantized version of the original model. For the full-precision version and more details, please refer to the original model repository. |
|
|
|
## Quantization Details |
|
|
|
This model was quantized using llama.cpp's quantization tools. The Q3_K_S format provides a good balance of model size, inference speed, and output quality for most use cases. |
|
|
|
## License |
|
|
|
This model follows the same license as the original model (Apache 2.0). |
|
|
|
## Citation |
|
|
|
If you use this model, please cite the original model authors and acknowledge the quantization. |