Built With Llama!

Built With Axolotl!

Overview

We fine-tuned SmileyLlama with DPO to improve its adherence to directions in the prompt.

For more details, read the ArXiv preprint here: https://arxiv.org/abs/2409.02231

How to use

This can be loaded using the same method as Llama3.1, and the memory requirements are the same as Llama-3.1-8B.

Options for "properties" that SmileyLlama was trained on are

  • ( <= 3, <= 4, <= 5, <= 7, > 7) H-bond donors
  • ( <= 3, <= 4, <= 5, <= 10, <= 15) H-bond acceptors
  • ( <= 300, <= 400, <= 500, <= 600, > 600) Molecular weight
  • ( <= 3, <= 4, <= 5, <= 10, <= 15, > 15) logP
  • ( <= 7, <= 10, > 10) Rotatable bonds
  • ( < 0.4, > 0.4, > 0.5, > 0.6) Fraction sp3
  • ( <= 90, <= 140, <= 200, > 200) TPSA
  • (a macrocycle, no macrocycles)
  • (has, lacks) bad SMARTS
  • lacks covalent warheads
  • has covalent warheads: (sulfonyl fluorides, acrylamides, ...) (see below for details)
  • A substructure of *SMILES_STRING*
  • A chemical of *CHEMICAL_FORMULA*

List of possible warheads:

  • sulfonyl fluorides: [#16](=[#8])(=[#8])-[#9]
  • chloroacetamides: [#8]=[#6](-[#6]-[#17])-[#7]
  • cyanoacrylamides: [#7]-[#6](=[#8])-[#6](-[#6]#[#7])=[#6]
  • epoxides: [#6]1-[#6]-[#8]-1
  • aziridines: [#6]1-[#6]-[#7]-1
  • disulfides: [#16]-[#16]
  • aldehydes: [#6](=[#8])-[#1]
  • vinyl sulfones: [#6]=[#6]-[#16](=[#8])(=[#8])-[#7]
  • boronic acids/esters: [#6]-[#5](-[#8])-[#8]
  • acrylamides: [#6]=[#6]-[#6](=[#8])-[#7]
  • cyanamides: [#6]-[#7](-[#6]#[#7])-[#6]
  • chloroFluoroAcetamides: [#7]-[#6](=[#8])-[#6](-[#9])-[#17]
  • butynamides: [#6]#[#6]-[#6](=[#8])-[#7]-[#6]
  • chloropropionamides: [#7]-[#6](=[#8])-[#6](-[#6])-[#17]
  • fluorosulfates: [#8]=[#16](=[#8])(-[#9])-[#8]
  • beta lactams: [#7]1-[#6]-[#6]-[#6]-1=[#8]

Generating a drug-like molecule which obeys the Lipinski rule of five

import torch
import transformers

model_id = "/path/to/your/model"

system_txt = "You love and excel at generating SMILES strings of drug-like molecules"
user_txt = "Output a SMILES string for a drug like molecule with the following properties: <= 5 H-bond donors, <= 10 H-bond acceptors, <= 500 molecule, <= 5 logP:"
prompt = f"### Instruction:\n{system_text}\n\n### Input:\n{user_text}\n\n### Response:\n"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    temperature=1.0
)

outputs = pipeline(
    prompt,
    max_new_tokens=128,
    num_return_sequences=4
)
for k in range(4):
  print(outputs[k]["generated_text"][-1])

You can use num_return_sequences to efficiently generate many SMILES strings rapidly, though this is limited by your memory.

Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for THGLab/Llama-3.1-8B-SmileyLlama-1.1-Prompt-Following

Collection including THGLab/Llama-3.1-8B-SmileyLlama-1.1-Prompt-Following