tifin-india
/

sarvam-m-24b-q3-k-s-gguf

Text Generation

Model card Files Files and versions

sarvam-m-24b-q3-k-s-gguf / README.md

prasannad28's picture

Update README.md

4a486be verified 4 months ago

|

history blame contribute delete

2.03 kB

	---
	license: apache-2.0
	tags:
	- text-generation
	- llama.cpp
	- gguf
	- quantized
	- q3_k_s
	model_type: llama
	inference: false
	base_model:
	- sarvamai/sarvam-m
	---

	# sarvam-m-24b - Q3_K_S GGUF

	This repository contains the Q3_K_S quantized version of sarvam-m-24b in GGUF format.

	## Model Details
	- Quantization: Q3_K_S
	- File Size: ~9.7GB
	- Description: Small model with substantial quality loss
	- Format: GGUF (compatible with llama.cpp)

	## Usage

	### With llama.cpp
	```bash
	# Download the model
	huggingface-cli download tifin-india/sarvam-m-24b-q3_k_s-gguf

	# Run inference
	./main -m sarvam-m-24b-Q3_K_S.gguf -p "Your prompt here"
	```

	### With Python (llama-cpp-python)
	```python
	from llama_cpp import Llama

	# Load the model
	llm = Llama(
	model_path="./sarvam-m-24b-Q3_K_S.gguf",
	n_ctx=2048, # Context length
	n_gpu_layers=35, # Adjust based on your GPU
	verbose=False
	)

	# Generate text
	response = llm("Your prompt here", max_tokens=100)
	print(response['choices'][0]['text'])
	```

	### With Transformers + AutoGGUF
	```python
	from transformers import AutoTokenizer
	from auto_gptq import AutoGPTQForCausalLM

	model_name = "tifin-india/sarvam-m-24b-q3_k_s-gguf"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoGPTQForCausalLM.from_quantized(model_name)
	```

	## Performance Characteristics

	\| Aspect \| Rating \|
	\|--------\|--------\|
	\| Speed \| ⭐⭐⭐⭐ \|
	\| Quality \| ⭐⭐ \|
	\| Memory \| ⭐⭐⭐⭐ \|

	## Original Model

	This is a quantized version of the original model. For the full-precision version and more details, please refer to the original model repository.

	## Quantization Details

	This model was quantized using llama.cpp's quantization tools. The Q3_K_S format provides a good balance of model size, inference speed, and output quality for most use cases.

	## License

	This model follows the same license as the original model (Apache 2.0).

	## Citation

	If you use this model, please cite the original model authors and acknowledge the quantization.