fine-blip-qa-model

This is a fine-tuned BLIP model for Visual Question Answering (VQA).

Model Description

This model is based on the Salesforce/blip-vqa-base (or similar) architecture and has been fine-tuned. It takes an image and a question as input and generates an answer.

Intended Use

This model is intended for demo purposes related to visual question answering tasks. It can be used to answer questions about the content of images.

How to Use


# Example Usage
from transformers import BlipProcessor, BlipForQuestionAnswering
from PIL import Image
import requests

model_id = "suc1dalspinach/fine_blip_gym"

# 2. Load the processor and model
processor = BlipProcessor.from_pretrained(model_id)
model = BlipForQuestionAnswering.from_pretrained(model_id)

# 3. Prepare your input (image and question)
image = Image.open("path/to/your/image.jpg").convert("RGB")

question = "What is the name of the gym equipment?"

# 4. Process the inputs
inputs = processor(images=image, text=question, return_tensors="pt", truncation=True)

# 5. Generate the answer
out = model.generate(**inputs)

# 6. Decode and print the answer
answer = processor.decode(out[0], skip_special_tokens=True)
print(f"Question: {question}")
print(f"Answer: {answer}")
Downloads last month
3
Safetensors
Model size
385M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suc1dalspinach/fine_blip_gym

Finetuned
(15)
this model

Dataset used to train suc1dalspinach/fine_blip_gym