DeBERTa: Decoding-enhanced BERT with Disentangled Attention Paper • 2006.03654 • Published Jun 5, 2020 • 3
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 21
RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper • 1907.11692 • Published Jul 26, 2019 • 9
Training language models to follow instructions with human feedback Paper • 2203.02155 • Published Mar 4, 2022 • 23
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Paper • 2201.11903 • Published Jan 28, 2022 • 14
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Paper • 2205.14135 • Published May 27, 2022 • 15