AWS Trainium & Inferentia documentation

LoRA for Neuron

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

LoRA for Neuron

LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.

PEFT Model Classes

NeuronPeftModel

class optimum.neuron.peft.NeuronPeftModel

< >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )

NeuronPeftModelForCausalLM

class optimum.neuron.peft.NeuronPeftModelForCausalLM

< >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )

LoRA Layer Implementations

Base LoRA Layer

class optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer

< >

( base_layer: Module ephemeral_gpu_offload: bool = False **kwargs )

Parallel Linear LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelLinear

< >

( base_layer adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

GQA QKV Column Parallel LoRA

class optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear

< >

( base_layer adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

Parallel Embedding LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding

< >

( base_layer: Module adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

LoRA Model

NeuronLoraModel

class optimum.neuron.peft.tuners.NeuronLoraModel

< >

( model config adapter_name low_cpu_mem_usage: bool = False )

Utility Functions

get_peft_model

optimum.neuron.peft.get_peft_model

< >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' mixed: bool = False autocast_adapter_dtype: bool = True revision: str | None = None low_cpu_mem_usage: bool = False )

Architecture Support

The Neuron LoRA implementation supports the following parallel layer types:

  • ColumnParallelLinear: For layers that split weights along the output dimension
  • RowParallelLinear: For layers that split weights along the input dimension
  • ParallelEmbedding: For embedding layers distributed across ranks
  • GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations

Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.

Key Features

  • Distributed Training: Full support for tensor parallelism and sequence parallelism
  • Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
  • Weight Transformation: Seamless integration with model weight transformation specs
  • Compatibility: Works with all supported custom modeling architectures in Optimum Neuron