--- license: apache-2.0 base_model: Qwen/Qwen2.5-VL-7B-Instruct tags: - vision-language - medical - multimodal - qwen2.5-vl datasets: - UCSC-VLAA/MedVLThinker-pmc_vqa-gpt_4o_reasoning-tokenized - UCSC-VLAA/MedVLThinker-m23k-tokenized - UCSC-VLAA/MedVLThinker-pmc_vqa - UCSC-VLAA/MedVLThinker-Eval language: - en pipeline_tag: image-text-to-text --- # MedVLThinker-7B-SFT_PMC Code: https://github.com/UCSC-VLAA/MedVLThinker Project Page: https://ucsc-vlaa.github.io/MedVLThinker/ ## Model Description MedVLThinker-7B-SFT_PMC is a 7B parameter medical vision-language model based on Qwen2.5-VL. This model has been trained using supervised fine-tuning on PMC-VQA dataset. ## Model Details - **Base Model**: Qwen/Qwen2.5-VL-7B-Instruct - **Model Size**: 7B parameters - **Training Method**: Supervised Fine-tuning - **Training Data**: PMC-VQA dataset ## Usage Check here for demo images: https://github.com/UCSC-VLAA/MedVLThinker?tab=readme-ov-file#demo ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info import torch # Load the model model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "UCSC-VLAA/MedVLThinker-7B-SFT_PMC", torch_dtype=torch.bfloat16, device_map="auto" ) processor = AutoProcessor.from_pretrained("UCSC-VLAA/MedVLThinker-7B-SFT_PMC") # Example usage messages = [ { "role": "system", "content": "You will solve a problem/request. You should provide your thoughts within tags before providing the answer. Write your final answer within tags.", }, { "role": "user", "content": [ { "type": "image", "image": "path/to/medical/image.jpg", }, {"type": "text", "text": "What can you see in this medical image?"}, ], } ] # Preparation for inference text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # Inference generated_ids = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, do_sample=True) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` ## Citation ```bibtex @article{medvlthinker2025, title={MedVLThinker: Simple Baselines for Multimodal Medical Reasoning}, author={Huang, Xiaoke and Wu, Juncheng and Liu, Hui and Tang, Xianfeng and Zhou, Yuyin}, journal={arXiv preprint}, year={2025} } ``` ## License This model is released under the Apache 2.0 license.