Better?
Is v1.4 better than the previous versions?
I tried the previous version (v1 or v1.1?) and wasn't impressed by the output.
Hey ;
Please clarify what your using this model for - use case(s) ?
NOTE:
Strict requirements for parameters and templates will make a large difference in performance for all versions.
Likewise your use case(s) may need more (or less experts activated) than the default of 8.
Changing the experts by 1 or 2 -> retry then.
Please clarify what your using this model for - use case(s) ?
Strict requirements for parameters and templates will make a large difference in performance for all versions.
Likewise your use case(s) may need more (or less experts activated) than the default of 8.
Changing the experts by 1 or 2 -> retry then.
Exact models are:
Qwen3-55B-A3B-TOTAL-RECALL-Deep-40X
Qwen3-53B-A3B-TOTAL-RECALL-v1.4-128k
While usecases at present are RPing and story generation primarily. I'm fairly new so managing/changing experts i'm not familiar enough with. So there's a high likelihood that is one of the problems i need to familiarize and experiment with.
As for parameters, I'm mostly using the defaults for SillyTavern and oobabooga's text-generation-webui.
So.. webui: Temp: 0.6-0.7, top_p:0.8, top_k: 20
sillytavern: Temp 0.65-0.80, Top_p: 1, and some penalty frequency 0.15.
Suggest V1.7 ; this version is the most creative of this series.
Likewise, try it with 4 experts to start, then 5/6.
You may want to turn thinking OFF:
/no_think
in the prompt(s) and/or system prompt.
If you turn it off, use 6 or 8 experts... 12 at the outside.
Suggest V1.7 ; this version is the most creative of this series.
Will try.
I'm not seeing any good references for managing the experts, but i'll get it sooner or later. For now I've put them all on the back burner, might just drop MoE's as options as most of them probably aren't fit for my purposes. (not to say they aren't useful).
Yes, ; issue with this model (all versions, including org) is that you don't know how many experts are "creative".
Same issue for coding, etc etc.
Alright, thanks. I got my eye on a few more of your models that are very promising :)
Gave another try with the recent 256k ctx quant release (using 32k so it doesn't really change it). Found 4 MOEs was the only semi-stable one to try and use it, otherwise it's jumping all over the place to unrelated questions and topics in it's replies.
Now it is very fast in comparison to a 70B model (anubis, lemonade, etc), but fighting with it to make it not think and needing to refresh the output repeatedly (like over 20 refreshes sometimes) makes this a hard sell for me, at least in the creative writing aspects.
I'll still look forward to seeing updates. :) When i got an actual coding project i'll try it again, or when there's a new model.
In testing I have run into the "over thinking issue" with modified and non-modified models.
This includes Qwen 2.5 and Qwen3 ; and Deepseek etc etc.
Rep pen, followed by temp can curve this.
With a more complex prompt (or a multi-stage chat) raising temp can reduce the "think blocks".
Lowering rep pen to 1.02-1.04 range helps too.
I should note this is problem specific level "tuning" so to speak.
Reducing the "thinking block" size (and repeat of them) is always a target of my ahh... adjustments.
The other option:
Turn off thinking altogether using the "/no_think" tag - especially useful in long / multi-turn convos.
There are other ways to stop thinking altogether, but they have costs too.
RE: 1.4 ; This model, and the Brainstorm adapter were specifically designed/testing for coding.
There are, as of this writing, over 100 Brainstorm adapters - each with different configs.
1.4 is a Brainstorm 40x modified.
I am planning a 20x version - 20x is far more stable, but less creative.
20x is already available with other coders I have built at this moment.
Yeah, i have /no_think that's added in the background automatically which it honors for just one post, and then won't after that. (for whatever reason?). Sometimes adding it explicitly again will work, but it's already present so i don't see why it's having issues.
But doing 4 experts did make it somewhat more stable creativity-wise. So there is that.
I'll keep an eye out. I loved some of the other models you've put out.