IT-Blender

Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention
Wonwoong Cho¹, Yanxia Zhang², Yan-Ying Chen² David Inouye¹
¹Elmore Family School of Electrical and Computer Engineering, Purdue University
²Toyota Research Institute

Features

IT-Blender is a T2I diffusion adapter that can automate the blending process of visual and textual concepts to enhance human creativity.

Preserving detailed visual concepts from a reference image: We leverage the denoising network (both UNet-based and DiT-based) as an image encoder to maintain the details of visual concepts.
Disentangling textual and visual concepts: We design a novel Blended Attention on top of the image self-attention module, where textual concepts are physically separated, encouraging disentanglement of textual and visual concepts.

Pretrained Models

Model	Base model	Description	Resolution
`IT-Blender FLUX`	FLUX.1-dev	The model used in the paper. 1.43 GB.	(512, 512)
`IT-Blender StableDiffusion`	SD 1.5	The model used in the paper. 99.1 MB.	(512, 512)

License

This project is licensed under the Purdue University.
See the LICENSE file for full license terms.

WonwoongCho
/

IT-Blender

IT-Blender

Features

Pretrained Models

License

Space using WonwoongCho/IT-Blender 1