Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 42
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM Paper • 2312.03788 • Published Dec 6, 2023 • 1
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Paper • 2501.01005 • Published Jan 2 • 1