Model Card
Summary
- Base model: h2oai/h2o-danube2-1.8b-base
Usage
To use the model with the transformers
library on a machine with GPUs, first make sure you have the transformers
library installed.
pip install transformers==4.50.3
Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.
- Either leave
token=True
in thepipeline
and login to hugginface_hub by running
import huggingface_hub
huggingface_hub.login(<ACCESS_TOKEN>)
- Or directly pass your to
token
in thepipeline
from transformers import pipeline
generate_text = pipeline(
model="SaffalPoosh/Nexus-multihop-1B",
torch_dtype="auto",
trust_remote_code=True,
device_map={"": "cuda:0"},
token=True,
)
# generate configuration can be modified to your needs
# generate_text.model.generation_config.min_new_tokens = 70
# generate_text.model.generation_config.max_new_tokens = 424
# generate_text.model.generation_config.do_sample = True
# generate_text.model.generation_config.num_beams = 2
# generate_text.model.generation_config.temperature = float(0.2)
# generate_text.model.generation_config.repetition_penalty = float(1.0)
messages = [
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "I'm doing great, how about you?"},
{"role": "user", "content": "Why is drinking water so healthy?"},
]
res = generate_text(
messages,
renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])
You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:
print(generate_text.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
))
You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "SaffalPoosh/Nexus-multihop-1B" # either local folder or Hugging Face model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
messages = [
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "I'm doing great, how about you?"},
{"role": "user", "content": "Why is drinking water so healthy?"},
]
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map={"": "cuda:0"},
trust_remote_code=True,
)
model.cuda().eval()
# generate configuration can be modified to your needs
# model.generation_config.min_new_tokens = 70
# model.generation_config.max_new_tokens = 424
# model.generation_config.do_sample = True
# model.generation_config.num_beams = 2
# model.generation_config.temperature = float(0.2)
# model.generation_config.repetition_penalty = float(1.0)
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to("cuda")
tokens = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
renormalize_logits=True
)[0]
tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)
Quantization and sharding
You can load the models using quantization by specifying load_in_8bit=True
or load_in_4bit=True
. Also, sharding on multiple GPUs is possible by setting device_map=auto
.
Model Architecture
Model is based on Mistral Architecture, training has been continued from the base model checkpoint.
MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32000, 2560, padding_idx=0)
(layers): ModuleList(
(0-23): 24 x MistralDecoderLayer(
(self_attn): MistralAttention(
(q_proj): Linear(in_features=2560, out_features=2560, bias=False)
(k_proj): Linear(in_features=2560, out_features=640, bias=False)
(v_proj): Linear(in_features=2560, out_features=640, bias=False)
(o_proj): Linear(in_features=2560, out_features=2560, bias=False)
)
(mlp): MistralMLP(
(gate_proj): Linear(in_features=2560, out_features=6912, bias=False)
(up_proj): Linear(in_features=2560, out_features=6912, bias=False)
(down_proj): Linear(in_features=6912, out_features=2560, bias=False)
(act_fn): SiLU()
)
(input_layernorm): MistralRMSNorm((2560,), eps=1e-05)
(post_attention_layernorm): MistralRMSNorm((2560,), eps=1e-05)
)
)
(norm): MistralRMSNorm((2560,), eps=1e-05)
(rotary_emb): MistralRotaryEmbedding()
)
(lm_head): Linear(in_features=2560, out_features=32000, bias=False)
)
- Downloads last month
- 11