wanhin
/

qwen2.5-7b-instruct-q4-k-m-gguf

Model card Files Files and versions

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

qwen2.5-7b-instruct-q4-k-m-gguf

Qwen2.5-7B-Instruct model quantized to Q4_K_M (4-bit, medium quality)

Quick Start

Download the model:

wget https://huggingface.co/your-username/qwen2.5-7b-instruct-q4-k-m-gguf/resolve/main/qwen2.5-7b-instruct-q4-k-m-gguf.gguf

Run inference:

# With llama.cpp
./main -m qwen2.5-7b-instruct-q4-k-m-gguf.gguf -n 512

# With Python
python -c "
from llama_cpp import Llama
llm = Llama(model_path='./qwen2.5-7b-instruct-q4-k-m-gguf.gguf')
print(llm('Hello!', max_tokens=100)['choices'][0]['text'])
"

Model Information

Base Model: Qwen2.5-7B-Instruct
Quantization: Q4_K_M
File Size: 4.4 GB
Format: GGUF

Performance

This quantized model provides a good balance between model quality and inference speed.

Downloads last month: 30

GGUF

Model size

7.62B params

Architecture

qwen2

Hardware compatibility

Log In to view the estimation

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support