-
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 28 -
Finch: Prompt-guided Key-Value Cache Compression
Paper • 2408.00167 • Published • 18 -
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
Paper • 2503.04973 • Published • 25 -
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression
Paper • 2406.11430 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2408.00167
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 30 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 51 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Paper • 2407.10960 • Published • 13 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 27 -
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper • 2407.14177 • Published • 45 -
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Paper • 2407.15017 • Published • 35
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 20 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 22 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 16
-
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 28 -
Finch: Prompt-guided Key-Value Cache Compression
Paper • 2408.00167 • Published • 18 -
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
Paper • 2503.04973 • Published • 25 -
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression
Paper • 2406.11430 • Published • 25
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Paper • 2407.10960 • Published • 13 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 27 -
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper • 2407.14177 • Published • 45 -
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Paper • 2407.15017 • Published • 35
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 30 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 51 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 20 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 22 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 16
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12