AWS Trainium & Inferentia documentation
LoRA for Neuron
LoRA for Neuron
LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.
PEFT Model Classes
NeuronPeftModel
class optimum.neuron.peft.NeuronPeftModel
< source >( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )
NeuronPeftModelForCausalLM
class optimum.neuron.peft.NeuronPeftModelForCausalLM
< source >( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )
LoRA Layer Implementations
Base LoRA Layer
class optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer
< source >( base_layer: Module ephemeral_gpu_offload: bool = False **kwargs )
Parallel Linear LoRA
class optimum.neuron.peft.tuners.lora.layer.ParallelLinear
< source >( base_layer adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )
GQA QKV Column Parallel LoRA
class optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear
< source >( base_layer adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )
Parallel Embedding LoRA
class optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding
< source >( base_layer: Module adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )
LoRA Model
NeuronLoraModel
class optimum.neuron.peft.tuners.NeuronLoraModel
< source >( model config adapter_name low_cpu_mem_usage: bool = False )
Utility Functions
get_peft_model
optimum.neuron.peft.get_peft_model
< source >( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' mixed: bool = False autocast_adapter_dtype: bool = True revision: str | None = None low_cpu_mem_usage: bool = False )
Architecture Support
The Neuron LoRA implementation supports the following parallel layer types:
- ColumnParallelLinear: For layers that split weights along the output dimension
- RowParallelLinear: For layers that split weights along the input dimension
- ParallelEmbedding: For embedding layers distributed across ranks
- GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations
Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.
Key Features
- Distributed Training: Full support for tensor parallelism and sequence parallelism
- Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
- Weight Transformation: Seamless integration with model weight transformation specs
- Compatibility: Works with all supported custom modeling architectures in Optimum Neuron