NCSOFT
/

VARCO-VISION-2.0-14B

@@ -25,7 +25,7 @@ language:
 ## Introduction
 **VARCO-VISION-2.0** is a multimodal AI model capable of understanding both images and text to answer user queries. It supports multi-image inputs, enabling effective processing of complex content such as documents, tables, and charts. The model demonstrates strong comprehension in both Korean and English, with significantly improved text generation capabilities and a deeper understanding of Korean cultural context. Compared to its predecessor, performance has been notably enhanced across various benchmarks, and its usability in real-world scenarios—such as everyday Q&A and information summarization—has also improved.
-In addition to the 14B full-scale model, a lightweight 1.7B version is available for on-device use, making it accessible on personal devices such as smartphones and PCs. VARCO-VISION-2.0 is a powerful open-source AI model built for Korean users and is freely available for a wide range of applications.
 ## 🚨News🎙️
 - 🛠️ 2025-08-22: We updated the checkpoint of VARCO-VISION-2.0-1.7B for improved performance.
@@ -57,7 +57,7 @@ In addition to the 14B full-scale model, a lightweight 1.7B version is available
 VARCO-VISION-2.0 follows the architecture of [LLaVA-OneVision](https://arxiv.org/abs/2408.03326).
 ## Evaluation
-We used [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) for evaluation whenever possible, and conducted our own implementations only for benchmarks not supported by the toolkit, **ensuring fair comparisons** with various open-source models.
 Please note that for certain benchmarks involving LLM-based evaluation (e.g., LLaVABench), results may not be exactly reproducible due to variations in the underlying LLM behavior.
 ### Korean Benchmark

 ## Introduction
 **VARCO-VISION-2.0** is a multimodal AI model capable of understanding both images and text to answer user queries. It supports multi-image inputs, enabling effective processing of complex content such as documents, tables, and charts. The model demonstrates strong comprehension in both Korean and English, with significantly improved text generation capabilities and a deeper understanding of Korean cultural context. Compared to its predecessor, performance has been notably enhanced across various benchmarks, and its usability in real-world scenarios—such as everyday Q&A and information summarization—has also improved.
+In addition to the 14B full-scale model, a lightweight 1.7B version is available for on-device use, making it accessible on personal devices such as smartphones and PCs. VARCO-VISION-2.0 is a powerful open-weight AI model built for Korean users and is freely available for a wide range of applications.
 ## 🚨News🎙️
 - 🛠️ 2025-08-22: We updated the checkpoint of VARCO-VISION-2.0-1.7B for improved performance.
 VARCO-VISION-2.0 follows the architecture of [LLaVA-OneVision](https://arxiv.org/abs/2408.03326).
 ## Evaluation
+We used [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) for evaluation whenever possible, and conducted our own implementations only for benchmarks not supported by the toolkit, **ensuring fair comparisons** with various open-weight models.
 Please note that for certain benchmarks involving LLM-based evaluation (e.g., LLaVABench), results may not be exactly reproducible due to variations in the underlying LLM behavior.
 ### Korean Benchmark