Larger versions

#1
by freegheist - opened

Is there any chance to make a GPTQ-Int8 for the 235B 2507 Thinking? theres an Int4 mix but it would be cool to have your Int8 quant on the large model

I’ll give it a try, but it depends on whether my resources are sufficient—stay tuned! In addition, the Int4-Mix format is also developed by our team. We’d love to hear your feedback, and we’ll actively work on improving compatibility with vLLM.

JunHowie changed discussion status to closed

I’ll give it a try, but it depends on whether my resources are sufficient—stay tuned! In addition, the Int4-Mix format is also developed by our team. We’d love to hear your feedback, and we’ll actively work on improving compatibility with vLLM.

I am using both QuantTrio/Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix in Qwen Code and QuantTrio/Qwen3-235B-A22B-Thinking-2507-GPTQ-Int4-Int8Mix for chat, both give great results in vLLM 10.1.1 with your latest configs. Appreciate your work and i'm happy to test anything that will fit on my 8xA6000 Ampere here!

Sign up or log in to comment