granite-docling-258M-mlx

Granite Docling is a multimodal Image-Text-to-Text model engineered for efficient document conversion. It preserves the core features of Docling while maintaining seamless integration with DoclingDocuments to ensure full compatibility.
This model was converted to MLX format from ibm-granite/granite-docling-258M
using mlx-vlm version 0.3.3.
Refer to the original model card for more details on the model.
💡 This MLX model is optimized to run efficiently on Apple Silicon Macs.
How to use this model with Docling
If you run through 🐥Docling, it will automatically choose the MLX version of the Granite-Docling model. You can select it with the CLI options shown below:
# Convert to HTML and Markdown:
docling --to html --to md --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887" # accepts files, urls or directories
# Convert to HTML including layout visualization:
docling --to html_split_page --show-layout --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887"
How to use this model with bare mlx-vlm
You can also run plain mlx-vlm to generate predictions.
To run with the mlx-vlm
CLI, use this command:
pip install mlx_vlm
python -m mlx_vlm.generate --model ibm-granite/granite-docling-258M-mlx --max-tokens 4096 --temperature 0.0 --prompt "Convert this page to docling." --image <path_to_image>
To run with the mlx-vlm
python SDK, parse the output as a DoclingDocument
and export to various formats (e.g. Markdown, HTML), please refer to the code below.
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "docling-core",
# "mlx-vlm",
# "pillow",
# "transformers",
# ]
# ///
import webbrowser
from pathlib import Path
from docling_core.types.doc import ImageRefMode
from docling_core.types.doc.document import DocTagsDocument, DoclingDocument
from mlx_vlm import load, stream_generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from transformers.image_utils import load_image
# Configuration
MODEL_PATH = "ibm-granite/granite-docling-258M-mlx"
PROMPT = "Convert this page to docling."
SHOW_IN_BROWSER = True
# Sample images (pick one...)
# SAMPLE_IMAGE = "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/assets/new_arxiv.png"
# SAMPLE_IMAGE = "https://ibm.biz/docling-page-with-list"
SAMPLE_IMAGE = "https://ibm.biz/docling-page-with-table"
# Load model and processor
print("Loading model...")
model, processor = load(MODEL_PATH)
config = load_config(MODEL_PATH)
# Prepare input image and prompt
print("Preparing input...")
pil_image = load_image(SAMPLE_IMAGE)
formatted_prompt = apply_chat_template(processor, config, PROMPT, num_images=1)
# Generate DocTags output
print("Generating DocTags...\n")
output = ""
for token in stream_generate(
model, processor, formatted_prompt, [pil_image], max_tokens=4096, verbose=False
):
output += token.text
print(token.text, end="")
if "</doctag>" in token.text:
break
print("\n\nProcessing output...")
# Create DoclingDocument from generated DocTags
doctags_doc = DocTagsDocument.from_doctags_and_image_pairs([output], [pil_image])
doc = DoclingDocument.load_from_doctags(doctags_doc, document_name="Sample Document")
# Export to different formats
print("\nMarkdown output:\n")
print(doc.export_to_markdown())
# Save as HTML with embedded images
output_path = Path("./output.html")
doc.save_as_html(output_path, image_mode=ImageRefMode.EMBEDDED)
print(f"\nHTML saved to: {output_path}")
# Open in browser
if SHOW_IN_BROWSER:
webbrowser.open(f"file:///{str(output_path.resolve())}")
- Downloads last month
- 3,349
Model tree for ibm-granite/granite-docling-258M-mlx
Base model
ibm-granite/granite-docling-258M