hdlm-group/hdlm-base-epsilon-0.05

This is a epsilon_hybrid diffusion language model trained on text data.

Model Details

Model Type: epsilon_hybrid
Architecture: Diffusion-based language model
Training Method: Epsilon-hybrid diffusion training

Configuration

hf_model_id: hdlm-group/hdlm-base-epsilon-0.0
reset_step_for_finetuning: true
ngpus: 4
type: aligned
gradient_accumulation_steps: 8
model_type: epsilon_hybrid
tokenizer:
  tokens: 50257
  model: gpt2
training:
  batch_size: 512
  accum: ${gradient_accumulation_steps}
  n_iters: 500000
  snapshot_freq: 5000
  log_freq: 500
  eval_freq: 5000
  snapshot_freq_for_preemption: 1000
  snapshot_sampling: true
  ema: 0.9999
  warmup_iter: 50000
  loss_type: hybrid
  epsilon: 0.05
  lambda: 5.0
  lr: 1.0e-05
data:
  train: openwebtext-train
  valid: wikitext103
  cache_dir: /home/toolkit/research-diffcodegen/data
  debug: false
annealing:
  type: none
  efficient: false
  width: 1024
  tau: 1024
  eval_tau: 1024
  sampling_method: sdlm
  sampling_eps: 0.0001
  attention:
    context_type: block_causal
    block_type: full
  match_inference: true
eval:
  batch_size: 32
  perplexity: true
  perplexity_batch_size: 16
optim:
  weight_decay: 0.1
  optimizer: AdamW
  lr: 5.0e-05
  beta1: 0.9
  beta2: 0.95
  eps: 1.0e-08
  warmup: 10000
  grad_clip: 1.0
  scheduler: cosine
experiment:
  name: ft_epsilon_0.05_lambda_5.0
  wandb_project: Hybrid-SDLM-ALIGNED
model:
  name: epsilon_hdlm
  type: ddit
  hidden_size: 768
  cond_dim: 128
  length: 1024
  n_blocks: 12
  n_heads: 12
  dropout: 0.1
  scale_by_sigma: false
  transformer_sigma_conditioning: false
  hybrid_sigma_embedding: false
  post_process_logits: false
  use_timestep_embedding: false

Usage

from our.hf_utils import smart_model_loader

# Load the model
model, config, device, accelerator, metaschedule = smart_model_loader(
    "hdlm-group/hdlm-base-epsilon-0.05",
    model_type="epsilon_hybrid"
)

# Use the model for text generation
# (Add specific usage examples based on your model's capabilities)

Training Details

This model was trained using the research-diffcodegen framework.

Citation

If you use this model in your research, please cite the original paper and this implementation.

License

This model is released under the MIT License.