ValueError: Cannot handle batch sizes > 1 if no padding token is defined. Raised when running `query_engine.query(query)` in llamaindex
I'm trying to use Qwen3-embedding-0.6
for accessing a vector index already built by that same embedding model, then use qwen3-reranker-4b
and qwen3:4b-q8_0
(Ollama), all three via LlamaIndex, in order to create a Q&A bot that can search, rerank and answer accurately the context of a given text.
The problem is with long markdown files. When I create an index of a large markdown file and load it with the embedding model, once I send the query to quere_engine
, it waits for like a few seconds then returns the error above, which seems raised by forward()
in modeling_qwen3.py
.:
hidden_states = transformer_outputs.last_hidden_state
logits = self.score(hidden_states)
if input_ids is not None:
batch_size = input_ids.shape[0]
else:
batch_size = inputs_embeds.shape[0]
if self.config.pad_token_id is None and batch_size != 1:
raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
The only way it seems to work is when I introduce a smaller index composed of a small markdown file, but the big one returns that error. I've tried changing the Settings.tokenizer
to different variants of the Qwen3 embedding/reranker models but none of them seems to work.
At this point I don't know if the issue is the file itself, a tokenizer mistake I made or something in between.
encountered the same error when trying to train a classification model from Qwen3-Embedding and solved by setting:model.config.pad_token_id = tokenizer.pad_token_id
. Also some other posts set it to model.config.eos_token_id
or tokenizer.eos_token_id
. I'm not sure if this is the proper solution but it works fine on my classification task.