[Possible bug] Tokenizer removes thinking part

#31
by haritzpuerto - opened

Hi, I just noticed that when you tokenize and apply the chat template to a conversation with a thinking part, the tokenizer removes the thinking part and keeps only the final answer. I think this is not the expected behavior (why removing the thinking part?)

Minimal reproducible example
https://colab.research.google.com/drive/1VAU_XIxaAdooXQ_DpoOL0cx-MGNV1ZgN?usp=sharing

image.png

I believe the problem comes from this line in the chat template.

image.png

Is this a bug? If not, how to keep the reasoning traces (i.e., the thinking part)? Thanks!

@mgubri thinks this behavior might be for efficiency in multi-turn conversations. However, in my case, I am not creating a multi-turn conversation. I want to create a fine-tuning scenario following the original chat template, so I need to keep the reasoning traces. Hence, I think it would be useful to have a parameter to keep the reasoning traces.

Sign up or log in to comment