Megatron Memory Estimator v0.13

Note: This estimator only measures the GPU memory directly managed by PyTorch when running Megatron. It does not include extra consumption from NCCL communication buffers, kernel fusion, overlap optimizations, CUDA Graphs, etc. Please use the "Overhead per GPU" option below to account for these additional costs.

Configuration

For detailed explanations of each parameter, please see the Megatron-LM arguments documentation.

Model Config (Editable)

History

Model Weight Gradient Optimizer (GB) Activation (GB) Total (GB/GPU) Actions