Intel/gpt-oss-20b-int4-AutoRound

Model Details

This model is a mixed int4 model with group_size 128 and symmetric quantization of openai/gpt-oss-20b generated by intel/auto-round
Following the official model, Non expert layers fallback to 16bits

How To Use

this model could not run on vLLM

INT4 Inference on CPU/Intel GPU/CUDA

from transformers import pipeline

model_id ="Intel/gpt-oss-20b-int4-AutoRound"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
    messages,
    max_new_tokens=512,
)
print(outputs[0]["generated_text"][-1])

"""
{'role': 'assistant', 'content': "analysisThe user wants a clear and concise explanation of quantum mechanics. The user is likely not a specialist; they want a concise explanation. We should provide an overview of the key concepts: wave-particle duality, quantization, superposition, measurement, uncertainty principle, entanglement, etc. A concise but clear explanation. We should avoid heavy math. Possibly short paragraphs or bullet points. Use analogies. The answer should be concise but thorough enough. We can mention examples like the double-slit experiment, electron spin, Heisenberg uncertainty, Schrödinger equation in simple terms. Also mention the role of wave functions and probabilities. Also mention that quantum mechanics is a theory of the microscopic world, where classical mechanics breaks down. Provide a simple summary of the key principles. The user might want an explanation that is not too long but still covers major points. We'll aim for maybe a few paragraphs or bullet points. We should avoid jargon. We'll keep it accessible. Provide a quick summary of key concepts and maybe a small example of how measurement works. We'll do that.assistantfinal**Quantum mechanics in a nutshell**\n\n| Key idea | What it means (in everyday terms) | Why it matters |\n|----------|-----------------------------------|----------------|\n| **Wave‑particle duality** | Light and matter act both like particles and waves. | Explains phenomena such as diffraction and interference that classical particles can’t. |\n| **Quantization** | Energy, momentum, angular momentum come in discrete “chunks.” | Electrons in atoms occupy specific “shells”; photons have exact energies (colors). |\n| **Superposition** | A system can be in several states at once (e.g., an electron in two places). | Gives rise to interference patterns and the ability to encode information in many states. |\n| **Probabilistic nature** | We can only predict probabilities of outcomes, not certainties. | The wave‑function gives a probability distribution for finding a particle where. |\n| **Uncertainty principle** | You can’t know position and momentum exactly at the same time. | Limits precision of measurements; explains why atoms don’t collapse into a point. |\n| **Measurement collapses the wave‑function** | Observing a system forces it into one of its possible states. | Explains why a light‑bulb turns on/off when you look at it, and why a cat in a box is either alive or dead. |\n| **Entanglement** | Two or more particles can share"}

""

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github