LoRA for Neuron

LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.

PEFT Model Classes

NeuronPeftModel

class optimum.neuron.peft.NeuronPeftModel

< source >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )

NeuronPeftModelForCausalLM

class optimum.neuron.peft.NeuronPeftModelForCausalLM

< source >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )

LoRA Layer Implementations

Base LoRA Layer

class optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer

< source >

( base_layer: Module ephemeral_gpu_offload: bool = False **kwargs )

Parallel Linear LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelLinear

< source >

( base_layer adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

GQA QKV Column Parallel LoRA

class optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear

< source >

Parallel Embedding LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding

< source >

( base_layer: Module adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

LoRA Model

NeuronLoraModel

class optimum.neuron.peft.tuners.NeuronLoraModel

< source >

( model config adapter_name low_cpu_mem_usage: bool = False )

Utility Functions

get_peft_model

optimum.neuron.peft.get_peft_model

< source >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' mixed: bool = False autocast_adapter_dtype: bool = True revision: str | None = None low_cpu_mem_usage: bool = False )

Architecture Support

The Neuron LoRA implementation supports the following parallel layer types:

ColumnParallelLinear: For layers that split weights along the output dimension
RowParallelLinear: For layers that split weights along the input dimension
ParallelEmbedding: For embedding layers distributed across ranks
GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations

Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.

Key Features

Distributed Training: Full support for tensor parallelism and sequence parallelism
Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
Weight Transformation: Seamless integration with model weight transformation specs
Compatibility: Works with all supported custom modeling architectures in Optimum Neuron

AWS Trainium & Inferentia

LoRA for Neuron

PEFT Model Classes

NeuronPeftModel

class optimum.neuron.peft.NeuronPeftModel

NeuronPeftModelForCausalLM

class optimum.neuron.peft.NeuronPeftModelForCausalLM

LoRA Layer Implementations

Base LoRA Layer

class optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer

Parallel Linear LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelLinear

GQA QKV Column Parallel LoRA

class optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear

Parallel Embedding LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding

LoRA Model

NeuronLoraModel

class optimum.neuron.peft.tuners.NeuronLoraModel

Utility Functions

get_peft_model

optimum.neuron.peft.get_peft_model

Architecture Support

Key Features