This model is amazing, but almost too amazing
This model is insanely good for it's functional size, especially on translation. But here is the issue, Google Translate hasn't been updated in a while seemingly and this model that runs fast on my phone does better than Google Translate and undermines it for me which is kinda funny. This is just scream into void asking Google to overhall Google Translate and to also just say how awesome this model really is :)
P.s. Also, E8B sized model would be great!
This model is indeed very good. I am using the 6-bit quantized, MLX version on my M4 Mac Mini, and getting around 22 tokens/sec, with the default 4K context window. The quality of answers, so far has been pretty satisfactory (in fact, some have been surprisingly good). Haven't encountered much hallucinations either, thus far. Tried generating Golang and Python code, and they had absolutely no issues. Of course, Golang and Python code is not for anything very complex, just a fancy TCP/IP client-server with CLI options, some basic error checking etc.
That's fantastic to hear! We really appreciate you sharing your positive experience and detailed feedback. Thank you