google/gemma-3-12b-it-qat-q4_0-gguf

CUDA_VISIBLE_DEVICES="0" python3 -m sglang.launch_server
--model /home/AI-ModelScope/gemma-3-27b-it-qat-q4_0-gguf
--tp 1
--load-format gguf
--trust-remote-code
--quantization gguf
--dtype bfloat16
--max-total-tokens 20000
--context-length 32768
--kv-cache-dtype auto
--enable-p2p-check
--host 0.0.0.0
--port 8801
--mem-fraction-static 0.8
--api-key yoursecret
--max-running-request 1000
--enable-metrics

error：
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 14, in
launch_server(server_args)
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 679, in launch_server
tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args)
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 541, in _launch_subprocesses
tokenizer_manager = TokenizerManager(server_args, port_args)
File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 159, in init
self.model_config = ModelConfig(
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 67, in init
self.hf_text_config = get_hf_text_config(self.hf_config)
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 359, in get_hf_text_config
class_name = config.architectures[0]
TypeError: 'NoneType' object is not subscriptabl

I copy the config.json、preprocessor_config.json、tokenizer.json、tokenizer.model、tokenizer_config.json，

and then I use
CUDA_VISIBLE_DEVICES="0" python3 -m sglang.launch_server
--model /home/AI-ModelScope/gemma-3-27b-it-qat-q4_0-gguf/gemma-3-27b-it-q4_0.gguf
--tp 1
--load-format gguf
--trust-remote-code
--quantization gguf
--dtype bfloat16
--max-total-tokens 20000
--context-length 32768
--kv-cache-dtype auto
--enable-p2p-check
--host 0.0.0.0
--port 8801
--mem-fraction-static 0.8
--api-key yoursecret
--max-running-request 1000
--enable-metrics

error：

what's the problem? and how to fix it

Hi @weitao0828 ,

Apologies for the late reply, welcome to Gemma family of Google's open source models. The error specifically at 'class_name = config.architectures[0]' , indicating that config.architectures is None. This usually means the config.json file for the model is missing or has an incorrectly formatted "architectures" key.

Please follow the below steps to inspect the issue:

Inspecting config.json: Verify that the config.json file in the model directory contains an "architectures" key, and its value is a list (e.g., ["GemmaForCausalLM"]), not null or missing.
Correcting config.json: If the architectures key is missing or incorrect, manually add or correct it to {"architectures": ["GemmaForCausalLM"], ...}.
Updating SGLang: As a less likely but possible solution, ensure SGLang is updated pip install -U sglang.

Please whether you have downloaded all the files correctly or not, or is there any corrupted file.

Thanks.

google
/

gemma-3-12b-it-qat-q4_0-gguf

sglang deploy error