How to disable reasoning?
I'm using this model with llama.cpp + openWebUI.
Is it possible to disable reasoning for this model?
I tried "/nothink" tag in the prompt - doesn't do anything.
I wonder, if I need to use special prompt? Or any other solution?
I don't think you can because they fine-tuned the base version of the model
If your front end supports something similar to SillyTavern's "Start Reply With" feature, you can pre-fill your own thinking block and the response will proceed straight to the answer. Here's the pre-fill I use for that:
<think>
Okay.
</think>
If your front end supports something similar to SillyTavern's "Start Reply With" feature, you can pre-fill your own thinking block and the response will proceed straight to the answer. Here's the pre-fill I use for that:
<think> Okay. </think>
You don't even need to put anything into think tags like "Okay" in your example. Just empty think tags will do. On the other hand though, while this method works fairly well, in practice it means filling your context window with garbage tokens that will be never really useful for anything context-wise. In one-shot scenarios where you just want a straight answer from the AI to a single question, this is a good solution. However, in roleplay scenarios where you need to have back and forth conversation with the AI, adding these extra tokens (think tags) just to prevent the AI from thinking before answering is less than ideal. It'd be best to finetune the model on roleplay data that would override this thinking habbit to prevent filling the context window with garbage tokens.
@SlavikF
I don't currently have this model in my library, but did you try /no_think
instead? Because I had the issue with Qwen3 models mistyping it! Maybe they use the same special token would need to check their ...
Edit lol: looking at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/raw/main/tokenizer.json, the dictionary unfortunately doesn't contain such /think
, /nothink
(neither /no_think
), it's all or nothing :D
So only solution is already mentioned (I think it's called hot steering or test time steering). But Openwebui doesn't allow editing the thoughts, even afterward (that's sad cause this kind of steering can be useful for when you see a mistake in the reasoning and want to force it take another path, sort of taking advantage of the "token level continuous checkpointing").
You can write something like "Okay, I think I am ready to answer." to bypass thinking. Plus some research concluded that the reasoning process isnt that important an the rl is just amplifying the "good" answers likelihood.