Is MXFP4_MOE more efficient than Q4_K_M? Which one should perform better?

#3
by nmkd - opened

Q4KM is a bit larger but I was wondering if maybe MXFP4 had an efficiency advantage to compensate that.

Which one would you recommend if VRAM is not an issue for either of them?

When i measured the perplexity of each, the Q4_K_M was lower. I haven't done extensive testing of it though, both seemed to have decent output.

Sign up or log in to comment