YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

IT-Blender


Build Github HuggingFace
arXiv

Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention
Wonwoong Cho1, Yanxia Zhang2, Yan-Ying Chen2 David Inouye1
1Elmore Family School of Electrical and Computer Engineering, Purdue University
2Toyota Research Institute

Features

IT-Blender is a T2I diffusion adapter that can automate the blending process of visual and textual concepts to enhance human creativity.

  • Preserving detailed visual concepts from a reference image: We leverage the denoising network (both UNet-based and DiT-based) as an image encoder to maintain the details of visual concepts.
  • Disentangling textual and visual concepts: We design a novel Blended Attention on top of the image self-attention module, where textual concepts are physically separated, encouraging disentanglement of textual and visual concepts.

Pretrained Models

Model Base model Description Resolution
IT-Blender FLUX FLUX.1-dev The model used in the paper. 1.43 GB. (512, 512)
IT-Blender StableDiffusion SD 1.5 The model used in the paper. 99.1 MB. (512, 512)

License

This project is licensed under the Purdue University.
See the LICENSE file for full license terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using WonwoongCho/IT-Blender 1