Issue with Kimi K2 Model Support
#40
by
jerin-scalers-ai
- opened
Issue with Kimi K2 Model Support
Issue on running moonshotai/Kimi-K2-Instruct model using vLLM v0.10.0. We're using the latest vllm docker image
Deployment
vllm:
container_name: vllm
image: vllm/vllm-openai:v0.10.0
environment:
- HUGGING_FACE_HUB_TOKEN=hf_XXXXXXXXX
command:
- "--served-model-name=moonshotai/Kimi-K2-Instruct"
- "--dtype=auto"
- "--max-model-len=8192"
- "--tensor-parallel-size=8"
- "--trust-remote-code"
volumes:
- /mnt/models:/root/.cache/huggingface
ports:
- 8000:8000
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
Error
vllm | INFO 07-28 03:34:26 [__init__.py:235] Automatically detected platform cuda.
vllm | INFO 07-28 03:34:28 [api_server.py:1755] vLLM API server version 0.10.0
vllm | INFO 07-28 03:34:28 [cli_args.py:261] non-default args: {'model': 'moonshotai/Kimi-K2-Instruct', 'trust_remote_code': True, 'max_model_len': 8192, 'tensor_parallel_size': 8}
vllm | You are using a model of type kimi_k2 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors.
vllm | INFO 07-28 03:34:28 [config.py:243] Replacing legacy 'type' key with 'rope_type'
vllm | INFO 07-28 03:34:35 [config.py:1604] Using max model len 8192
vllm | INFO 07-28 03:34:36 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=8192.
vllm | Traceback (most recent call last):
vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 11, in read_file
vllm | import blobfile
vllm | ModuleNotFoundError: No module named 'blobfile'
vllm |
vllm | The above exception was the direct cause of the following exception:
vllm |
vllm | Traceback (most recent call last):
vllm | File "<frozen runpy>", line 198, in _run_module_as_main
vllm | File "<frozen runpy>", line 88, in _run_code
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1856, in <module>
vllm | uvloop.run(run_server(args))
vllm | File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
vllm | return __asyncio.run(
vllm | ^^^^^^^^^^^^^^
vllm | File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
vllm | return runner.run(main)
vllm | ^^^^^^^^^^^^^^^^
vllm | File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
vllm | return self._loop.run_until_complete(task)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
vllm | File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
vllm | return await main
vllm | ^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server
vllm | await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker
vllm | async with build_async_engine_client(args, client_config) as engine_client:
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm | return await anext(self.gen)
vllm | ^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
vllm | async with build_async_engine_client_from_engine_args(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm | return await anext(self.gen)
vllm | ^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
vllm | async_llm = AsyncLLM.from_vllm_config(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
vllm | return cls(
vllm | ^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 100, in __init__
vllm | self.tokenizer = init_tokenizer_from_configs(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 111, in init_tokenizer_from_configs
vllm | return TokenizerGroup(
vllm | ^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 24, in __init__
vllm | self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 238, in get_tokenizer
vllm | tokenizer = AutoTokenizer.from_pretrained(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1035, in from_pretrained
vllm | return tokenizer_class.from_pretrained(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 2014, in from_pretrained
vllm | return cls._from_pretrained(
vllm | ^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
vllm | tokenizer = cls(*init_inputs, **init_kwargs)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/root/.cache/huggingface/modules/transformers_modules/moonshotai/Kimi-K2-Instruct/0826e83ab45fac04e0360b328b1e431cc5094d92/tokenization_kimi.py", line 101, in __init__
vllm | mergeable_ranks = load_tiktoken_bpe(vocab_file)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 148, in load_tiktoken_bpe
vllm | contents = read_file_cached(tiktoken_bpe_file, expected_hash)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 63, in read_file_cached
vllm | contents = read_file(blobpath)
vllm | ^^^^^^^^^^^^^^^^^^^
vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 13, in read_file
vllm | raise ImportError(
vllm | ImportError: blobfile is not installed. Please install it by running pip install blobfile.
vllm exited with code 1
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Issue with Kimi K2 Model Support
Issue on running moonshotai/Kimi-K2-Instruct model using vLLM v0.10.0. We're using the latest vllm docker image
Deployment
vllm: container_name: vllm image: vllm/vllm-openai:v0.10.0 environment: - HUGGING_FACE_HUB_TOKEN=hf_XXXXXXXXX command: - "--served-model-name=moonshotai/Kimi-K2-Instruct" - "--dtype=auto" - "--max-model-len=8192" - "--tensor-parallel-size=8" - "--trust-remote-code" volumes: - /mnt/models:/root/.cache/huggingface ports: - 8000:8000 deploy: resources: reservations: devices: - driver: nvidia capabilities: [gpu]
Error
vllm | INFO 07-28 03:34:26 [__init__.py:235] Automatically detected platform cuda. vllm | INFO 07-28 03:34:28 [api_server.py:1755] vLLM API server version 0.10.0 vllm | INFO 07-28 03:34:28 [cli_args.py:261] non-default args: {'model': 'moonshotai/Kimi-K2-Instruct', 'trust_remote_code': True, 'max_model_len': 8192, 'tensor_parallel_size': 8} vllm | You are using a model of type kimi_k2 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors. vllm | INFO 07-28 03:34:28 [config.py:243] Replacing legacy 'type' key with 'rope_type' vllm | INFO 07-28 03:34:35 [config.py:1604] Using max model len 8192 vllm | INFO 07-28 03:34:36 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=8192. vllm | Traceback (most recent call last): vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 11, in read_file vllm | import blobfile vllm | ModuleNotFoundError: No module named 'blobfile' vllm | vllm | The above exception was the direct cause of the following exception: vllm | vllm | Traceback (most recent call last): vllm | File "<frozen runpy>", line 198, in _run_module_as_main vllm | File "<frozen runpy>", line 88, in _run_code vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1856, in <module> vllm | uvloop.run(run_server(args)) vllm | File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run vllm | return __asyncio.run( vllm | ^^^^^^^^^^^^^^ vllm | File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run vllm | return runner.run(main) vllm | ^^^^^^^^^^^^^^^^ vllm | File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run vllm | return self._loop.run_until_complete(task) vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete vllm | File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper vllm | return await main vllm | ^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server vllm | await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker vllm | async with build_async_engine_client(args, client_config) as engine_client: vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ vllm | return await anext(self.gen) vllm | ^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client vllm | async with build_async_engine_client_from_engine_args( vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ vllm | return await anext(self.gen) vllm | ^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args vllm | async_llm = AsyncLLM.from_vllm_config( vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config vllm | return cls( vllm | ^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 100, in __init__ vllm | self.tokenizer = init_tokenizer_from_configs( vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 111, in init_tokenizer_from_configs vllm | return TokenizerGroup( vllm | ^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 24, in __init__ vllm | self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 238, in get_tokenizer vllm | tokenizer = AutoTokenizer.from_pretrained( vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1035, in from_pretrained vllm | return tokenizer_class.from_pretrained( vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 2014, in from_pretrained vllm | return cls._from_pretrained( vllm | ^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained vllm | tokenizer = cls(*init_inputs, **init_kwargs) vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/root/.cache/huggingface/modules/transformers_modules/moonshotai/Kimi-K2-Instruct/0826e83ab45fac04e0360b328b1e431cc5094d92/tokenization_kimi.py", line 101, in __init__ vllm | mergeable_ranks = load_tiktoken_bpe(vocab_file) vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 148, in load_tiktoken_bpe vllm | contents = read_file_cached(tiktoken_bpe_file, expected_hash) vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 63, in read_file_cached vllm | contents = read_file(blobpath) vllm | ^^^^^^^^^^^^^^^^^^^ vllm | File "/usr/local/lib/python3.12/dist-packages/tiktoken/load.py", line 13, in read_file vllm | raise ImportError( vllm | ImportError: blobfile is not installed. Please install it by running pip install blobfile. vllm exited with code 1
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.