AI & ML interests

A Family of Dynamic UltraFast Small Language Models Ready for Embodied Artificial General Intelligence!

Recent Activity

prithivMLmods 
posted an update about 18 hours ago
view post
Post
572
Comparing: DeepCaption-VLA-7B, built on Qwen2.5-VL-7B-Instruct, is tailored for image captioning and vision-language attribution, focusing on precise, descriptive captions of visual properties, object attributes, and scene details. In contrast, Qwen2.5-VL-7B-Abliterated-Caption-it is fine-tuned for abliterated captioning, generating highly detailed descriptions across diverse visual categories.

Models🤗
✦ DeepCaption-VLA-7B : prithivMLmods/DeepCaption-VLA-7B
✦ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it

Spaces⛵
➜ VisionScope-R2 : prithivMLmods/VisionScope-R2
➜ Qwen2.5-VL-Outpost : prithivMLmods/Qwen2.5-VL-Outpost

Collection🗞️
DeepCaption attr. : prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a
VL Abliterated-Caption : prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3
Multimodal VLMs - Until July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

GitHub↗️
> DeepCaption-VLA-7B [4bit-notebook demo] : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
> Qwen2.5-VL-3B-Abliterated-Caption-it(caption) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Qwen2.5-VL-3B-Abliterated-Caption-it(caption)/Qwen2_5_VL_3B_Abliterated_Caption_it.ipynb

The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀

To know more about it, visit the app page or the respective model page!!
prithivMLmods 
posted an update 4 days ago
view post
Post
5336
FastVLMs by Apple are the talk of the week for edge device VLMs and also for consumer-grade VLMs on the Hub. They have some impressive demos available on the Hub for live captioning and inference tasks. Meanwhile, I’m still exploring one of the coolest edge-device multimodal releases—Liquid AI’s LFM2-VL (450M and 1.6B). I’ve also made a live camera video inference demo, which is capable of running on Colab’s free-tier T4 GPU.

🤗Live Captioning Notebooks:
➠ LiquidAI LFM2 VL 1.6B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_1_6B_Live_Cam.ipynb

➠ LiquidAI LFM2 VL 450M Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_450M_Live_Cam.ipynb

✨I also made a demo for the FastVLM Live Captioning Notebook.
➠ FastVLM 0.5B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Apple-FastVLM-0.5B-Live-Cam/apple_FastVLM_0_5B_live_cam.ipynb

↗️For more notebooks, kindly visit the following repositories.
➠ Multimodal Outpost Notebooks: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks

Feel free to fork, modify, and explore!
prithivMLmods 
posted an update 8 days ago
view post
Post
3389
Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!
  • 4 replies
·
prithivMLmods 
posted an update 10 days ago
view post
Post
1223
OpenGVLab's InternVL3.5 is a new family of open-source multimodal models that have advanced versatility, reasoning, and efficiency. I have created 𝐝𝐞𝐦𝐨 𝐧𝐨𝐭𝐞𝐛𝐨𝐨𝐤𝐬 for models ranging from 1B to 4B parameters, available in multiple versions (MPO, Instruct, Pre-trained) and in both "thinking" and "non-thinking" settings, with experimental compatibility for 𝐓𝐞𝐬𝐥𝐚 𝐓𝟒 GPUs.

➠InternVL3_5_2B_MPO_Thinking: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3.5-Thinking/1_InternVL3_5_2B_MPO_Thinking/1_InternVL3_5_2B_MPO_Thinking.ipynb
➠InternVL3_5_1B_Instruct_Thinking: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3.5-Thinking/2_InternVL3_5_1B_Instruct_Thinking/2_InternVL3_5_1B_Instruct_Thinking.ipynb

➠InternVL3_5-1B-MPO: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-MPO/InternVL3_5-1B-MPO/InternVL3_5_1B_MPO.ipynb
➠InternVL3_5-2B-MPO: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/tree/main/InternVL-3.5-Notebook/InternVL3_5-MPO/InternVL3_5-2B-MPO

➠InternVL3_5-1B-Instruct: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Instruct/InternVL3_5-1B-Instruct/InternVL3_5_1B_Instruct.ipynb
➠InternVL3_5-2B-Instruct: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Instruct/InternVL3_5-2B-Instruct/InternVL3_5_2B_Instruct.ipynb

➠InternVL3_5-1B-Pretrained: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Pretrained/InternVL3_5-1B-Pretrained/InternVL3_5_1B_Pretrained.ipynb
➠InternVL3_5-2B-Pretrained: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Pretrained/InternVL3_5-2B-Pretrained/InternVL3_5_2B_Pretrained.ipynb

no flash_attention
prithivMLmods 
posted an update 11 days ago
view post
Post
5144
OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions.

✨ Space/App : prithivMLmods/Tiny-VLMs-Lab
🫙 Model : OpenGVLab/InternVL3_5-2B-MPO
↗️ Collection: OpenGVLab/internvl35-68ac87bd52ebe953485927fb
🗞️ Paper : https://arxiv.org/pdf/2508.18265
↗️ Multimodal Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

To learn more, visit the relevant spaces, collections, and model cards.
  • 1 reply
·
prithivMLmods 
posted an update 12 days ago
view post
Post
448
Dropping new adapters for Qwen-Image, including Qwen-Image-Studio-Realism, Qwen-Image-Anime-LoRA, Qwen-Image-Sketch-Smudge, Qwen-Image-Synthetic-Face, and Qwen-Image-Fragmented-Portraiture, with various style intermix compatibilities. For more details, visit the model card.

⤷ Studio Realism : prithivMLmods/Qwen-Image-Studio-Realism
⤷ Image Anime LoRA : prithivMLmods/Qwen-Image-Anime-LoRA
⤷ Sketch Smudge : prithivMLmods/Qwen-Image-Sketch-Smudge
⤷ Synthetic Face : prithivMLmods/Qwen-Image-Synthetic-Face
⤷ Fragmented Portraiture : prithivMLmods/Qwen-Image-Fragmented-Portraiture

Try it here at
✦︎ Qwen-Image-LoRA-DLC : prithivMLmods/Qwen-Image-LoRA-DLC
✦︎ Qwen-Image-Diffusion : prithivMLmods/Qwen-Image-Diffusion

Collection
✦︎ Qwen-Image-Exp-LoRA : prithivMLmods/qwen-image-exp-lora-68a978fe11400bc3165b0c4d
✦︎ Image Gen Apps (Diffusion) - LastUpdated 08/18 : prithivMLmods/image-gen-apps-diffusion-lastupdated-08-18-68a2f4c5ef3e5e394eacc20a

.
.
.

To know more, visit the following spaces, collections, and model cards.
prithivMLmods 
posted an update 19 days ago
prithivMLmods 
posted an update 21 days ago
view post
Post
4685
Excited to introduce the Tiny VLMs Lab App for experiencing 15+ multimodal VLMs, ranging from a 250M parameter model to a 4B parameter model, for tasks like OCR, reasoning, small models for single-shot answering, and captioning (abliterated), across a broad range of visual categories including images with complex, sensitive, or nuanced content, while handling varying aspect ratios and resolutions.🧪

🤗 Space/App: prithivMLmods/Tiny-VLMs-Lab

✦︎ Also introducing prithivMLmods/Qwen2.5-VL-3B-Abliterated-Caption-it, tailored for Abliterated Captioning / Uncensored Image Captioning. This release comes as a lighter alternative to the existing Qwen2.5-VL-7B-Abliterated-Caption-it prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it model, making it usable on mid-range GPUs and even experimental on T4 GPUs.

✦︎ Collection: prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3
✦︎ GitHub: https://github.com/PRITHIVSAKTHIUR/Tiny-VLMs-Lab
.
.
.
To know more about it, visit the app page or the respective model page!!
prithivMLmods 
posted an update 24 days ago
view post
Post
3189
Try Liquid AI's all-new multimodal models: LFM2-VL-1.6B & LFM2-VL-450M! Demo with the Gradio UI and ReportLab support and both models are runnable on T4 GPU!

↗ LFM2-VL-1.6B-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-1.6B-LiquidAI/LFM2-VL-1.6B_ReportLab.ipynb

↗ LFM2-VL-450M-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-450M-LiquidAI/LFM2-VL-450M_ReportLab.ipynb

.
.
.
To know more about it, visit the multimodal outpost notebooks !!
  • 1 reply
·