view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk β’ Oct 7, 2024 β’ 50