hdlm-group/hdlm-base-epsilon-0.05

This is a epsilon_hybrid diffusion language model trained on text data.

Model Details

  • Model Type: epsilon_hybrid
  • Architecture: Diffusion-based language model
  • Training Method: Epsilon-hybrid diffusion training

Configuration

hf_model_id: hdlm-group/hdlm-base-epsilon-0.0
reset_step_for_finetuning: true
ngpus: 4
type: aligned
gradient_accumulation_steps: 8
model_type: epsilon_hybrid
tokenizer:
  tokens: 50257
  model: gpt2
training:
  batch_size: 512
  accum: ${gradient_accumulation_steps}
  n_iters: 500000
  snapshot_freq: 5000
  log_freq: 500
  eval_freq: 5000
  snapshot_freq_for_preemption: 1000
  snapshot_sampling: true
  ema: 0.9999
  warmup_iter: 50000
  loss_type: hybrid
  epsilon: 0.05
  lambda: 5.0
  lr: 1.0e-05
data:
  train: openwebtext-train
  valid: wikitext103
  cache_dir: /home/toolkit/research-diffcodegen/data
  debug: false
annealing:
  type: none
  efficient: false
  width: 1024
  tau: 1024
  eval_tau: 1024
  sampling_method: sdlm
  sampling_eps: 0.0001
  attention:
    context_type: block_causal
    block_type: full
  match_inference: true
eval:
  batch_size: 32
  perplexity: true
  perplexity_batch_size: 16
optim:
  weight_decay: 0.1
  optimizer: AdamW
  lr: 5.0e-05
  beta1: 0.9
  beta2: 0.95
  eps: 1.0e-08
  warmup: 10000
  grad_clip: 1.0
  scheduler: cosine
experiment:
  name: ft_epsilon_0.05_lambda_5.0
  wandb_project: Hybrid-SDLM-ALIGNED
model:
  name: epsilon_hdlm
  type: ddit
  hidden_size: 768
  cond_dim: 128
  length: 1024
  n_blocks: 12
  n_heads: 12
  dropout: 0.1
  scale_by_sigma: false
  transformer_sigma_conditioning: false
  hybrid_sigma_embedding: false
  post_process_logits: false
  use_timestep_embedding: false

Usage

from our.hf_utils import smart_model_loader

# Load the model
model, config, device, accelerator, metaschedule = smart_model_loader(
    "hdlm-group/hdlm-base-epsilon-0.05",
    model_type="epsilon_hybrid"
)

# Use the model for text generation
# (Add specific usage examples based on your model's capabilities)

Training Details

This model was trained using the research-diffcodegen framework.

Citation

If you use this model in your research, please cite the original paper and this implementation.

License

This model is released under the MIT License.

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support