Collections
Discover the best community collections!
Collections including paper arxiv:2406.09900
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 67 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 59
-
Efficient LLM Inference on CPUs
Paper • 2311.00502 • Published • 7 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 14 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 29 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ
Text Generation • 2B • Updated • 1.37k • 318 -
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
Text Generation • 6B • Updated • 292k • 138 -
mistralai/Mixtral-8x7B-Instruct-v0.1
47B • Updated • 288k • 4.55k -
TheBloke/MixtralOrochi8x7B-GPTQ
Text Generation • 6B • Updated • 10 • 7
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 20 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 24
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 29 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 67 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 59
-
TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ
Text Generation • 2B • Updated • 1.37k • 318 -
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
Text Generation • 6B • Updated • 292k • 138 -
mistralai/Mixtral-8x7B-Instruct-v0.1
47B • Updated • 288k • 4.55k -
TheBloke/MixtralOrochi8x7B-GPTQ
Text Generation • 6B • Updated • 10 • 7
-
Efficient LLM Inference on CPUs
Paper • 2311.00502 • Published • 7 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 14 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 20 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 24
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12