PIXIE-Spell-Reranker-Preview-0.6B

PIXIE-Spell-Reranker-Preview-0.6B is a decoder-based reranker trained on Korean and English information retrieval dataset, developed by TelePIX Co., Ltd. PIXIE stands for TelePIX Intelligent Embedding, representing TelePIXโ€™s high-performance embedding technology. This model is specifically optimized for semantic reranking tasks in Korean and English, and demonstrates strong performance in aerospace domain applications. Through extensive fine-tuning and domain-specific evaluation, PIXIE shows robust reranking quality for real-world use cases such as document understanding, technical QA, and semantic search in aerospace and related high-precision fields. It also performs competitively across a wide range of open-domain Korean and English retrieval benchmarks, making it a versatile foundation for multilingual reranking systems.

Model Description

  • Model Type: Cross Encoder
  • Maximum Sequence Length: 40960 tokens
  • Language: Multilingual โ€” optimized for high performance in Korean and English
  • Domain Specialization: Aerospace
  • License: apache-2.0

Quality Benchmarks

PIXIE-Spell-Reranker-Preview-0.6B is a multilingual reranker specialized for Korean and English reranking tasks. It delivers consistently strong performance across a diverse set of domain-specific and open-domain benchmarks in both languages, demonstrating its effectiveness in real-world reranking applications. The table below presents the reranking performance of several rerankers evaluated on a variety of Korean and English benchmarks. We report Normalized Discounted Cumulative Gain (NDCG) scores, which measure how well a ranked list of documents aligns with ground truth relevance. Higher values indicate better reranking quality.

  • Avg. NDCG: Average of NDCG@1, @3, @5, and @10 across all benchmark datasets.
  • NDCG@k: Relevance quality of the top-k retrieved results.

All evaluations were conducted using the open-source Korean-MTEB-Retrieval-Evaluators codebase to ensure consistent dataset handling, indexing, retrieval, and NDCG@k computation across models.

6 Datasets of MTEB (Korean)

Our model, telepix/PIXIE-Spell-Reranker-Preview-0.6B, achieves strong performance across most metrics and benchmarks, demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.

Model Name # params Avg. NDCG NDCG@1 NDCG@3 NDCG@5 NDCG@10
telepix/PIXIE-Spell-Reranker-Preview-0.6B 0.6B 0.7896 0.7494 0.7910 0.8022 0.8168
BAAI/bge-reranker-v2-m3 0.5B 0.7861 0.7448 0.7868 0.7998 0.8133
dragonkue/bge-reranker-v2-m3-ko 0.5B 0.7849 0.7505 0.7843 0.7959 0.8089
Alibaba-NLP/gte-multilingual-reranker-base 0.3B 0.7594 0.7067 0.7610 0.7778 0.7922
jinaai/jina-reranker-v2-base-multilingual 0.3B 0.6879 0.6410 0.6888 0.7027 0.7192

Note: SPLADE shortlist size fixed at candidate_k = 100 for all experiments.

Descriptions of the benchmark datasets used for evaluation are as follows:

  • Ko-StrategyQA
    A Korean multi-hop open-domain question answering dataset designed for complex reasoning over multiple documents.
  • AutoRAGRetrieval
    A domain-diverse retrieval dataset covering finance, government, healthcare, legal, and e-commerce sectors.
  • MIRACLRetrieval
    A document retrieval benchmark built on Korean Wikipedia articles.
  • PublicHealthQA
    A retrieval dataset focused on medical and public health topics.
  • BelebeleRetrieval
    A dataset for retrieving relevant content from web and news articles in Korean.
  • MultiLongDocRetrieval
    A long-document retrieval benchmark based on Korean Wikipedia and mC4 corpus.

Note: While many benchmark datasets are available for evaluation, in this project we chose to use only those that contain clean positive documents for each query. Keep in mind that a benchmark dataset is just that a benchmark. For real-world applications, it is best to construct an evaluation dataset tailored to your specific domain and evaluate embedding models, such as PIXIE, in that environment to determine the most suitable one.

7 Datasets of BEIR (English)

Our model, telepix/PIXIE-Spell-Reranker-Preview-0.6B, achieves strong performance on a wide range of tasks, including fact verification, multi-hop question answering, financial QA, and scientific document retrieval, demonstrating competitive generalization across diverse domains.

Model Name # params Avg. NDCG NDCG@1 NDCG@3 NDCG@5 NDCG@10
telepix/PIXIE-Spell-Reranker-Preview-0.6B 0.6B 0.3635 0.3692 0.3663 0.3589 0.3594
Alibaba-NLP/gte-multilingual-reranker-base 0.3B 0.3284 0.3238 0.3297 0.3282 0.3320
BAAI/bge-reranker-v2-m3 0.5B 0.3143 0.3129 0.3158 0.3124 0.3162
jinaai/jina-reranker-v2-base-multilingual 0.3B 0.3118 0.3051 0.3132 0.3104 0.3187
dragonkue/bge-reranker-v2-m3-ko 0.5B 0.3042 0.3033 0.3035 0.3016 0.3087

Note: BM25 shortlist size fixed at candidate_k = 100 for all experiments.

Descriptions of the benchmark datasets used for evaluation are as follows:

  • ArguAna
    A dataset for argument retrieval based on claim-counterclaim pairs from online debate forums.
  • FEVER
    A fact verification dataset using Wikipedia for evidence-based claim validation.
  • FiQA-2018
    A retrieval benchmark tailored to the finance domain with real-world questions and answers.
  • HotpotQA
    A multi-hop open-domain QA dataset requiring reasoning across multiple documents.
  • MSMARCO
    A large-scale benchmark using real Bing search queries and corresponding web documents.
  • NQ
    A Google QA dataset where user questions are answered using Wikipedia articles.
  • SCIDOCS
    A citation-based document retrieval dataset focused on scientific papers.

Direct Use (Semantic Search)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

# Requires transformers>=4.51.0
from sentence_transformers import CrossEncoder

def format_queries(query, instruction=None):
    prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
    if instruction is None:
        instruction = (
            "Given a web search query, retrieve relevant passages that answer the query"
        )
    return f"{prefix}<Instruct>: {instruction}\n<Query>: {query}\n"


def format_document(document):
    suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
    return f"<Document>: {document}{suffix}"


model = CrossEncoder("telepix/PIXIE-Spell-Reranker-Preview-0.6B")

task = "Given a web search query, retrieve relevant passages that answer the query"

queries = [
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์–ด๋–ค ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋‚˜์š”?",
    "๊ตญ๋ฐฉ ๋ถ„์•ผ์— ์–ด๋–ค ์œ„์„ฑ ์„œ๋น„์Šค๊ฐ€ ์ œ๊ณต๋˜๋‚˜์š”?",
    "ํ…”๋ ˆํ”ฝ์Šค์˜ ๊ธฐ์ˆ  ์ˆ˜์ค€์€ ์–ด๋А ์ •๋„์ธ๊ฐ€์š”?",
    "๊ตญ๋ฐฉ ๋ถ„์•ผ์— ์–ด๋–ค ์œ„์„ฑ ์„œ๋น„์Šค๊ฐ€ ์ œ๊ณต๋˜๋‚˜์š”?",         # ๋ถ€๋ถ„/๋น„๊ด€๋ จ ์˜ˆ์‹œ์šฉ
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์–ด๋–ค ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋‚˜์š”?" # ๋ถ€๋ถ„/๊ด€๋ จ ์˜ˆ์‹œ์šฉ
]

documents = [
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ํ•ด์–‘, ์ž์›, ๋†์—… ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
    "์ •์ฐฐ ๋ฐ ๊ฐ์‹œ ๋ชฉ์ ์˜ ์œ„์„ฑ ์˜์ƒ์„ ํ†ตํ•ด ๊ตญ๋ฐฉ ๊ด€๋ จ ์ •๋ฐ€ ๋ถ„์„ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
    "TelePIX์˜ ๊ด‘ํ•™ ํƒ‘์žฌ์ฒด ๋ฐ AI ๋ถ„์„ ๊ธฐ์ˆ ์€ Global standard๋ฅผ ์ƒํšŒํ•˜๋Š” ์ˆ˜์ค€์œผ๋กœ ํ‰๊ฐ€๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์šฐ์ฃผ์—์„œ ์ˆ˜์ง‘ํ•œ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜์—ฌ '์šฐ์ฃผ ๊ฒฝ์ œ(Space Economy)'๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐ€์น˜๋ฅผ ์ฐฝ์ถœํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์œ„์„ฑ ์˜์ƒ ํš๋“๋ถ€ํ„ฐ ๋ถ„์„, ์„œ๋น„์Šค ์ œ๊ณต๊นŒ์ง€ ์ „ ์ฃผ๊ธฐ๋ฅผ ์•„์šฐ๋ฅด๋Š” ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
]

pairs = [
    [format_queries(query, task), format_document(doc)]
    for query, doc in zip(queries, documents)
]

scores = model.predict(pairs)
print(scores.tolist())
# [0.9999946355819702, 0.8422356247901917, 0.8858100771903992, 0.3226671516895294, 0.6746261715888977]

License

The PIXIE-Spell-Reranker-Preview-0.6B model is licensed under Apache License 2.0.

Citation

@software{TelePIX-PIXIE-Spell-Reranker-Preview-0.6B,
  title={PIXIE-Spell-Reranker-Preview-0.6B},
  author={TelePIX AI Research Team and Bongmin Kim},
  year={2025},
  url={https://huggingface.co/telepix/PIXIE-Spell-Reranker-Preview-0.6B}
}

Contact

If you have any suggestions or questions about the PIXIE, please reach out to the authors at bmkim@telepix.net.

Downloads last month
14
Safetensors
Model size
596M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including telepix/PIXIE-Spell-Reranker-Preview-0.6B