File size: 8,107 Bytes
949df4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
<<<<<<< HEAD
# MV-VTON

PyTorch implementation of **MV-VTON: Multi-View Virtual Try-On with Diffusion Models**

[![arXiv](https://img.shields.io/badge/arXiv-2404.04908-b10.svg)](https://arxiv.org/abs/2404.17364)
[![Project](https://img.shields.io/badge/Project-Website-orange)](https://hywang2002.github.io/MV-VTON/)
![visitors](https://visitor-badge.laobi.icu/badge?page_id=hywang2002.MV-VTON)
[![LICENSE](https://img.shields.io/badge/license-CC--BY--NC--SA--4.0-lightgrey)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

## News
- πŸ”₯The first multi-view virtual try-on dataset MVG is now available.
- πŸ”₯Checkpoints on both frontal-view and multi-view virtual try-on tasks are released.

## Overview

![](assets/framework.png)
> **Abstract:**  
> The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given
> clothing. However, most existing methods solely focus on the frontal try-on using the frontal clothing. When the views
> of the clothing and person are significantly inconsistent, particularly when the person’s view is non-frontal, the
> results are unsatisfactory. To address this challenge, we introduce Multi-View Virtual Try-ON (MV-VTON), which aims to
> reconstruct the dressing results of a person from multiple views using the given clothes. On the one hand, given that
> single-view clothes provide insufficient information for MV-VTON, we instead employ two images, i.e., the frontal and
> back views of the clothing, to encompass the complete view as much as possible. On the other hand, the diffusion
> models
> that have demonstrated superior abilities are adopted to perform our MV-VTON. In particular, we propose a
> view-adaptive
> selection method where hard-selection and soft-selection are applied to the global and local clothing feature
> extraction, respectively. This ensures that the clothing features are roughly fit to the person’s view. Subsequently,
> we
> suggest a joint attention block to align and fuse clothing features with person features. Additionally, we collect a
> MV-VTON dataset, i.e., Multi-View Garment (MVG), in which each person has multiple photos with diverse views and
> poses.
> Experiments show that the proposed method not only achieves state-of-the-art results on MV-VTON task using our MVG
> dataset, but also has superiority on frontal-view virtual try-on task using VITON-HD and DressCode datasets.

## Getting Started

### Installation

1. Clone the repository

```shell

git clone https://github.com/hywang2002/MV-VTON.git

cd MV-VTON

```

2. Install Python dependencies

```shell

conda env create -f environment.yaml

conda activate mv-vton

```

3. Download the pretrained [vgg](https://drive.google.com/file/d/1rvow8jStPt8t2prDcSRlnf8yzXhrYeGo/view?usp=sharing)
   checkpoint and put it in `models/vgg/` for Multi-View VTON and `Frontal-View VTON/models/vgg/` for Frontal-View VTON.
4. Download the pretrained models `mvg.ckpt` via [Baidu Cloud](https://pan.baidu.com/s/17SC8fHE5w2g7gEtzJgRRew?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/1J91PoT8A9yqHWNxkgRe6ZCnDEhN-H9O6/view?usp=sharing),
   and `vitonhd.ckpt` via [Baidu Cloud](https://pan.baidu.com/s/1R2yGgm35UwTpnXPEU6-tlA?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/13A0uzUY6PuvitLOqzyHzWASOh0dNXdem/view?usp=sharing), and put `mvg.ckpt` in `checkpoint/` and
   put `vitonhd.ckpt`
   in `Frontal-View VTON/checkpoint/`.

### Datasets

#### MVG

1. Fill `Dataset Request Form` via [Baidu Cloud](https://pan.baidu.com/s/12HAq0V4FfgpU_q8AeyZzwA?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/1zWt6HYBz7Vzaxu8rp1bwkhRoBkxbwQjw/view?usp=sharing), and
   contact `cshy2mvvton@outlook.com` with this form to get MVG dataset (
   Non-institutional emails (e.g. gmail.com) are not allowed. Please provide your institutional
   email address.).

After these, the folder structure should look like this (the warp_feat_unpair* only included in test directory):

```

β”œβ”€β”€ MVG

|   β”œβ”€β”€ unpaired.txt

β”‚   β”œβ”€β”€ [train | test]

|   |   β”œβ”€β”€ image-wo-bg

β”‚   β”‚   β”œβ”€β”€ cloth

β”‚   β”‚   β”œβ”€β”€ cloth-mask

β”‚   β”‚   β”œβ”€β”€ warp_feat

β”‚   β”‚   β”œβ”€β”€ warp_feat_unpair

β”‚   β”‚   β”œβ”€β”€ ...

```

#### VITON-HD

1. Download [VITON-HD](https://github.com/shadow2496/VITON-HD) dataset
2. Download pre-warped cloth image/mask via [Baidu Cloud](https://pan.baidu.com/s/1uQM0IOltOmbeqwdOKX5kCw?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/18DTWfhxUnfg41nnwwpCKN--akC4eT9DM/view?usp=sharing) and
   put
   it under VITON-HD dataset.

After these, the folder structure should look like this (the unpaired-cloth* only included in test directory):

```

β”œβ”€β”€ VITON-HD

|   β”œβ”€β”€ test_pairs.txt

|   β”œβ”€β”€ train_pairs.txt

β”‚   β”œβ”€β”€ [train | test]

|   |   β”œβ”€β”€ image

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ cloth

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ cloth-mask

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ cloth-warp

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ cloth-warp-mask

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ unpaired-cloth-warp

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ unpaired-cloth-warp-mask

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

```

### Inference

#### MVG

To test on paired settings (using `cp_dataset_mv_paired.py`), you can modify the `configs/viton512.yaml` and `main.py`,
or directly rename `cp_dataset_mv_paired.py` to `cp_dataset.py` (recommended). Then run:

```shell

sh test.sh

```

To test on unpaired settings, rename `cp_dataset_mv_unpaired.py` to `cp_dataset.py`, and do the same operation.

#### VITON-HD

To test on paired settings, input command `cd Frontal-View\ VTON/`, then directly run:

```shell

sh test.sh

```

To test on unpaired settings, input command `cd Frontal-View\ VTON/`, add `--unpaired` to `test.sh`, add then run:

```shell

sh test.sh

```

#### Metrics

We compute `LPIPS`, `SSIM`, `FID`, `KID` using the same tools in [LaDI-VTON](https://github.com/miccunifi/ladi-vton).

### Training

#### MVG

We use Paint-by-Example as initialization, please download the pretrained model
from [Google Drive](https://drive.google.com/file/d/15QzaTWsvZonJcXsNv-ilMRCYaQLhzR_i/view) and save the model to
directory `checkpoints`. Rename `cp_dataset_mv_paired.py` to `cp_dataset.py`, then run:

```shell

sh train.sh

```

#### VITON-HD

Input command `cd Frontal-View\ VTON/`, then directly run:

```shell

sh train.sh

```

## Acknowledgements

Our code is heavily borrowed from [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example)
and [DCI-VTON](https://github.com/bcmi/DCI-VTON-Virtual-Try-On). We also
thank previous work [PF-AFN](https://github.com/geyuying/PF-AFN), [GP-VTON](https://github.com/xiezhy6/GP-VTON),
[LaDI-VTON](https://github.com/miccunifi/ladi-vton)
and [StableVITON](https://github.com/rlawjdghek/StableVITON).

## LICENSE
MV-VTON: Multi-View Virtual Try-On with Diffusion Models Β© 2024 by Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo is licensed under CC BY-NC-SA 4.0 

## Citation

```

@article{wang2024mv,

  title={MV-VTON: Multi-View Virtual Try-On with Diffusion Models},

  author={Wang, Haoyu and Zhang, Zhilu and Di, Donglin and Zhang, Shiliang and Zuo, Wangmeng},

  journal={arXiv preprint arXiv:2404.17364},

  year={2024}

}

```
=======
---
title: Mv Vton Demo
emoji: πŸ‘
colorFrom: gray
colorTo: pink
sdk: gradio
sdk_version: 5.26.0

app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
>>>>>>> 2a4541c57faf075fa9e813ae2777dfaa55fc0306