YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

qwen2.5-7b-instruct-q4-k-m-gguf

Qwen2.5-7B-Instruct model quantized to Q4_K_M (4-bit, medium quality)

Quick Start

  1. Download the model:
wget https://huggingface.co/your-username/qwen2.5-7b-instruct-q4-k-m-gguf/resolve/main/qwen2.5-7b-instruct-q4-k-m-gguf.gguf
  1. Run inference:
# With llama.cpp
./main -m qwen2.5-7b-instruct-q4-k-m-gguf.gguf -n 512

# With Python
python -c "
from llama_cpp import Llama
llm = Llama(model_path='./qwen2.5-7b-instruct-q4-k-m-gguf.gguf')
print(llm('Hello!', max_tokens=100)['choices'][0]['text'])
"

Model Information

  • Base Model: Qwen2.5-7B-Instruct
  • Quantization: Q4_K_M
  • File Size: 4.4 GB
  • Format: GGUF

Performance

This quantized model provides a good balance between model quality and inference speed.

Downloads last month
30
GGUF
Model size
7.62B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support