ValueError: Cannot handle batch sizes > 1 if no padding token is defined. Raised when running `query_engine.query(query)` in llamaindex

#8
by theforgehermit - opened

I'm trying to use Qwen3-embedding-0.6 for accessing a vector index already built by that same embedding model, then use qwen3-reranker-4b and qwen3:4b-q8_0 (Ollama), all three via LlamaIndex, in order to create a Q&A bot that can search, rerank and answer accurately the context of a given text.

The problem is with long markdown files. When I create an index of a large markdown file and load it with the embedding model, once I send the query to quere_engine, it waits for like a few seconds then returns the error above, which seems raised by forward() in modeling_qwen3.py.:

hidden_states = transformer_outputs.last_hidden_state
        logits = self.score(hidden_states)

        if input_ids is not None:
            batch_size = input_ids.shape[0]
        else:
            batch_size = inputs_embeds.shape[0]

        if self.config.pad_token_id is None and batch_size != 1:
            raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")

The only way it seems to work is when I introduce a smaller index composed of a small markdown file, but the big one returns that error. I've tried changing the Settings.tokenizer to different variants of the Qwen3 embedding/reranker models but none of them seems to work.

At this point I don't know if the issue is the file itself, a tokenizer mistake I made or something in between.

encountered the same error when trying to train a classification model from Qwen3-Embedding and solved by setting:model.config.pad_token_id = tokenizer.pad_token_id. Also some other posts set it to model.config.eos_token_id or tokenizer.eos_token_id. I'm not sure if this is the proper solution but it works fine on my classification task.

Sign up or log in to comment