fh[;dZddlmZddlmZddlmZddlmZeje Z GddeZ Gd d eZ Gd d eZGd deZGddeZgdZy)zBLT model configuration)Enum)Union)PretrainedConfig)loggingcHeZdZdZdZ dfd ZxZS)BLTLocalEncoderConfigzB Configuration class for the BLT Local Encoder component. blt_local_encoderc R||_||_||_||_||_||_|xs||_||z|_|xstd|zdz |_ ||_ | |_ | |_ | |_ | |_| xsddi|_||_||_t%|Ldi|yNr rope_typedefault vocab_sizecross_attn_all_layers cross_attn_khidden_size_global hidden_sizenum_attention_headsnum_key_value_headshead_dimintintermediate_sizenum_hidden_layersnorm_epsdropoutmax_position_embeddings rope_theta rope_scaling hidden_act_attn_implementationsuper__init__selfrrrrrrrrrrrr r!r"rr#kwargs __class__s [/fsx/ita_zaporozhets/transformers/src/transformers/models/blt_wip copy/configuration_blt.pyr%zBLTLocalEncoderConfig.__init__!(%%:"("4&#6 #6#M:M #':: !2!Nc!k/A:M6N!2   '>$$(D[),D$$8! "6"Tir Nr h㈵>@NsiluNsdpa__name__ __module__ __qualname____doc__ model_typer% __classcell__r)s@r*r r K%J"  $##&#&#r,r cHeZdZdZdZ dfd ZxZS)BLTLocalDecoderConfigzB Configuration class for the BLT Local Decoder component. blt_local_decoderc R||_||_||_||_||_||_|xs||_||z|_|xstd|zdz |_ ||_ | |_ | |_ | |_ | |_| xsddi|_||_||_t%|Ldi|yr rr&s r*r%zBLTLocalDecoderConfig.__init__Pr+r,r-r7r>s@r*rArAIr?r,rAc@eZdZdZdZ dfd ZxZS)BLTGlobalTransformerConfigzG Configuration class for the BLT Global Transformer component. blt_global_transformerc  ||_||_|xs||_||z|_| xst d|zdz |_||_||_||_||_ ||_ | xsddi|_ | |_ | |_ t|<di| yr )rrrrrrrrrrr r!r"r#r$r%)r'rrrrrrrr r!r"rr#r(r)s r*r%z#BLTGlobalTransformerConfig.__init__s '#6 #6#M:M #':: !2!Nc!k/A:M6N!2   '>$$(D[),D$$8! "6"r,) r0r Nr r1r2r3r4Nr5Nr6r7r>s@r*rErEys?*J  $###r,rEc@eZdZdZdZ dfd ZxZS)BLTPatcherConfiga Configuration class for the BLT Patcher/Entropy model component. Args: vocab_size (`int`, *optional*, defaults to 256): Vocabulary size for the entropy model used in patching. hidden_size (`int`, *optional*, defaults to 512): Hidden dimension for the entropy model. num_hidden_layers (`int`, *optional*, defaults to 8): Number of layers in the entropy model. num_attention_heads (`int`, *optional*, defaults to 8): Number of attention heads in the entropy model. head_dim (`int`, *optional*): Dimension of each attention head in the entropy model. num_key_value_heads (`int`, *optional*): Number of key-value heads in the entropy model. max_position_embeddings (`int`, *optional*, defaults to 1024): Maximum sequence length for the entropy model. norm_eps (`float`, *optional*, defaults to 1e-5): Layer normalization epsilon for the entropy model. dropout (`float`, *optional*, defaults to 0.0): Dropout probability for the entropy model. ffn_dim_multiplier (`float`, *optional*): Feedforward dimension multiplier for the entropy model. multiple_of (`int`, *optional*, defaults to 256): Make feedforward dimension multiple of this for the entropy model. rope_theta (`float`, *optional*, defaults to 10000.0): RoPE theta parameter for the entropy model. attn_impl (`str`, *optional*, defaults to "sdpa"): Attention implementation for the entropy model. attn_bias_type (`str`, *optional*, defaults to "causal"): Attention bias type for the entropy model. blt_patcherc  B||_||_||_||_||z|_||n||_||_||_||_| |_ | |_ | |_ d|_ | xstd|jzdz |_ddi|_t!|Ddi| y)Nr5r rrrr)rrrrrrrrrr attn_implattn_bias_typer"rrr!r$r%)r'rrrrrrrrr rLrMrr(r)s r*r%zBLTPatcherConfig.__init__s %&!2#6 #':: :M:Y#6_r '>$   $", !2!Sc!d>N>N:NQR:R6S()4 "6"r,) r.r0r r Nr3r1r2r4r6causalNr7r>s@r*rIrIs@ DJ $##r,rIc^eZdZdZdZdgZeeee dZ dfd Z xZ S) BLTConfigay This is the configuration class to store the configuration of a [`BLTModel`]. It is used to instantiate a BLT model according to the specified arguments, defining the model architecture. Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the documentation from [`PretrainedConfig`] for more information. Args: vocab_size (`int`, *optional*, defaults to 256): Vocabulary size of the BLT model. Defines the number of different tokens (bytes) that can be represented. max_position_embeddings (`int`, *optional*, defaults to 1024): The maximum sequence length that this model can handle. # Patching configuration patch_in_forward (`bool`, *optional*, defaults to False): Whether to perform patching during forward pass. patch_size (`float`, *optional*): Size of patches for static patching. patching_mode (`str`, *optional*): Mode for patching ("entropy", "static", etc.). patching_threshold (`float`, *optional*): Threshold for entropy-based patching. patching_batch_size (`int`, *optional*, defaults to 1): Batch size for patching operations. patching_device (`str`, *optional*, defaults to "cuda"): Device to use for patching operations. max_patch_length (`int`, *optional*): Maximum length of patches. # Cross attention configurations cross_attn_k (`int`, *optional*): Number of cross attention components. # Encoder configurations encoder_hash_byte_group_size (`Any`, *optional*): Hash byte group size for encoder. encoder_hash_byte_group_vocab (`int`, *optional*, defaults to 30000): Vocabulary size for hash byte groups. encoder_hash_byte_group_nb_functions (`int`, *optional*, defaults to 3): Number of hash functions for byte groups. # Component configurations patcher_config (`Union[BLTPatcherConfig, dict]`, *optional*): Configuration for the BLT patcher/entropy model component. encoder_config (`Union[BLTLocalEncoderConfig, dict]`, *optional*): Configuration for the BLT local encoder component. decoder_config (`Union[BLTLocalDecoderConfig, dict]`, *optional*): Configuration for the BLT local decoder component. global_config (`Union[BLTGlobalTransformerConfig, dict]`, *optional*): Configuration for the BLT global transformer component. ```python >>> from transformers import BLTModel, BLTConfig >>> # Initializing a BLT configuration >>> configuration = BLTConfig() >>> # Initializing a model from the configuration >>> model = BLTModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```bltpast_key_values)patcher_configencoder_configdecoder_config global_configc ||_||_||_||_||_||_||_||_||_| |_ | xsgd|_ | |_ | |_ | %t|_tj!dn8t#| t$rtdi| |_nt#| tr| |_|%t'|_tj!dn8t#|t$rt'di||_nt#|t&r||_|%t+|_tj!dn8t#|t$rt+di||_nt#|t*r||_|%t/|_tj!dn8t#|t$rt/di||_nt#|t.r||_t3|hdd|i|y)N)r/rz8patcher_config is None, using default BLT patcher configz8encoder_config is None, using default BLT encoder configz8decoder_config is None, using default BLT decoder configz6global_config is None, using default BLT global configtie_word_embeddingsr)rYrrpatch_in_forward patch_size patching_modepatching_thresholdpatching_batch_sizemax_patch_lengthrencoder_hash_byte_group_sizeencoder_hash_byte_group_vocab$encoder_hash_byte_group_nb_functionsrIrSloggerinfo isinstancedictr rTrArUrErVr$r%)r'rrrZr[r\r]r^r_rr`rarbrSrTrUrVrYr(r)s r*r%zBLTConfig.__init__2s.$7 $'>$!1$*"4#6 0)-I,UI)-J*4X1  !"2"4D  KKR S  -"2"D^"DD  (8 9"0D   !"7"9D  KKR S  -"7"I."ID  (= >"0D   !"7"9D  KKR S  -"7"I."ID  (= >"0D   !;!=D  KKP Q  t ,!;!Lm!LD   'A B!.D  K-@KFKr,)r.r3FNNNNr/Ni0urNNNNF) r8r9r:r;r<keys_to_ignore_at_inferencerIr rArE sub_configsr%r=r>s@r*rPrPso> @J#4"5*//3 K $%)&+-.!%LLLLr,rP)rPrIr rArEN)r;enumrtypingrconfiguration_utilsrutilsr get_loggerr8rcr rArErIrP__all__rr,r*rps}3   H %-#,-#^-#,-#`%#!1%#PD#'D#NVL VLp r,