Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.18898

about 8 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

🌀 Bytedance Papers

Seed-Coder: Let the Code Model Curate Data for Itself

Paper • 2506.03524 • Published Jun 4 • 6
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10 • 4
FlowTok: Flowing Seamlessly Across Text and Image Tokens

Paper • 2503.10772 • Published Mar 13 • 19
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Paper • 2503.09949 • Published Mar 13 • 5

Unifying Visual Understanding and Generation via Text-Aligned Representations

Running on Zero

45

45

Tar

🚀

Unified MLLM with Text-Aligned Representations
ByteDance-Seed/Tar-1.5B

Any-to-Any • 3B • Updated Jul 2 • 269 • 19
ByteDance-Seed/Tar-7B

Any-to-Any • 9B • Updated Jul 2 • 114 • 38
ByteDance-Seed/Tar-TA-Tok

Updated Jul 2 • 6

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Fashion-VDM: Video Diffusion Model for Virtual Try-On

Paper • 2411.00225 • Published Oct 31, 2024 • 11
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

Paper • 2410.22901 • Published Oct 30, 2024 • 8
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

image generation

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33
Running on Zero

45

45

Tar

🚀

Unified MLLM with Text-Aligned Representations
Running on Zero

3

3

Tar

🚀

Unified MLLM with Text-Aligned Representations
Runtime error

60

60

Tar

🚀

Unified MLLM with Text-Aligned Representations

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 31
Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published Mar 21 • 37
Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Paper • 2506.14603 • Published Jun 17 • 20
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

about 8 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

🌀 Bytedance Papers

Seed-Coder: Let the Code Model Curate Data for Itself

Paper • 2506.03524 • Published Jun 4 • 6
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10 • 4
FlowTok: Flowing Seamlessly Across Text and Image Tokens

Paper • 2503.10772 • Published Mar 13 • 19
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Paper • 2503.09949 • Published Mar 13 • 5

image generation

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

Unifying Visual Understanding and Generation via Text-Aligned Representations

Running on Zero

45

45

Tar

🚀

Unified MLLM with Text-Aligned Representations
ByteDance-Seed/Tar-1.5B

Any-to-Any • 3B • Updated Jul 2 • 269 • 19
ByteDance-Seed/Tar-7B

Any-to-Any • 9B • Updated Jul 2 • 114 • 38
ByteDance-Seed/Tar-TA-Tok

Updated Jul 2 • 6

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33
Running on Zero

45

45

Tar

🚀

Unified MLLM with Text-Aligned Representations
Running on Zero

3

3

Tar

🚀

Unified MLLM with Text-Aligned Representations
Runtime error

60

60

Tar

🚀

Unified MLLM with Text-Aligned Representations

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 31
Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published Mar 21 • 37
Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Paper • 2506.14603 • Published Jun 17 • 20
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

Fashion-VDM: Video Diffusion Model for Virtual Try-On

Paper • 2411.00225 • Published Oct 31, 2024 • 11
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

Paper • 2410.22901 • Published Oct 30, 2024 • 8
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs