🧠 MedScholar-1.5B

MedScholar-1.5B is a compact, instruction-aligned medical question-answering model fine-tuned on 1 million randomly selected examples from the MIRIAD-4.4M dataset. It is based on the Qwen/Qwen2.5-1.5B-Instruct model and designed for efficient, in-context clinical knowledge exploration — not diagnosis.

📌 Model Details

Base Model: Qwen2.5-1.5B-Instruct-unsloth-bnb-4bit
Fine-tuning Dataset: MIRIAD-4.4M
Samples Used: 1,000,000 examples randomly selected from the full set
Prompt Style: Minimal QA format (see below)
Training Framework: Unsloth with QLoRA
License: Apache-2.0 (inherits from base model); dataset is ODC-By 1.0

📋 Prompt Format

### Question:
What is the role of LDL in cardiovascular health?

### Answer:
LDL plays a central role in the development of atherosclerosis by delivering cholesterol to peripheral tissues...

The model expects the prompt to end with ### Answer:, and will generate only the answer text.
Do not include the answer in the prompt during inference.

🔒 Dataset Consent & License

This model was fine-tuned using randomly selected 1 million examples from the MIRIAD-4.4M dataset, which is released under the ODC-By 1.0 License.

The MIRIAD dataset is intended exclusively for academic research and educational exploration. As stated by its authors:

“The outputs generated by models trained or fine-tuned on this dataset must not be used for medical diagnosis or decision-making involving real individuals.”

⚠️ Intended Use

This model is for research, educational, and exploration purposes only. It is not a medical device and must not be used to provide clinical advice, diagnosis, or treatment.

💡 Example Inference (Python)

from transformers import pipeline

pipe = pipeline("text-generation", model="yasserrmd/MedScholar-1.5B", device=0)

prompt = """### Question:
What are the symptoms of acute pancreatitis?

### Answer:
"""

response = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7)
print(response[0]["generated_text"])

🤝 Acknowledgements

MIRIAD Dataset by Zheng et al. (2025) – https://huggingface.co/datasets/miriad/miriad-4.4M
Qwen2.5 by Alibaba – https://huggingface.co/Qwen
Training infrastructure: Unsloth

📄 Citation

@misc{yasser2025medscholar,
  title = {MedScholar-1.5B: Compact medical QA model fine-tuned on MIRIAD},
  author = {Mohamed Yasser},
  year = {2025},
  howpublished = {\url{https://huggingface.co/yasserrmd/MedScholar-1.5B}},
}

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

yasserrmd
/

MedScholar-1.5B