--- tags: - sentence-transformers - sentence-similarity - information-retrieval - semantic-search widget: - source_sentence: >- Descrivi dettagliatamente il processo chimico e fisico che avviene durante la preparazione di un impasto per crostata sentences: - >- ## La Magia Chimica e Fisica nell'Impasto della Crostata: Un Viaggio Dagli Ingredienti Secchi al Trionfo del Forno La preparazione di una crostata, apparentemente un gesto semplice e familiare, cela in realtà un affascinante balletto di reazioni chimiche e trasformazioni fisiche... - >- ## L'Arte Effimera: Creare un Dolce Paesaggio Invernale Immergiamoci nel cuore pulsante della pasticceria festiva, dove l'arte culinaria si fonde con la creatività artistica... - >- Le piattaforme di comunicazione digitale, con la loro ubiquità crescente, si configurano come un'arma a doppio taglio nel panorama sociale contemporaneo... pipeline_tag: sentence-similarity library_name: sentence-transformers language: - it license: apache-2.0 ---

Ita-Search 🇮🇹

# Fine-tuned Qwen3-Embedding for Italian Semantic Retrieval This model is a specialized fine-tuned version of [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) optimized for Italian semantic retrieval tasks, with particular emphasis on Italian query understanding and document ranking. ## Model Description - **Model Type**: Dense embedding model for semantic retrieval - **Base Model**: [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) - **Output Dimensionality**: 1,024-dimensional dense vectors - **Maximum Sequence Length**: 32,768 tokens - **Primary Language**: Italian - **Similarity Function**: Cosine similarity ## Capabilities ### Italian Semantic Retrieval The model demonstrates strong performance in matching Italian queries to Italian documents, particularly effective in technical and academic domains within the Italian language context. ### Domain Coverage Trained on diverse Italian knowledge domains including: - **Medical & Health Sciences**: Diagnostic imaging, clinical procedures, medical terminology - **STEM Fields**: Physics, computer science, geology, engineering - **Professional Domains**: Finance, law, agriculture, software development - **Educational Content**: Historical studies, culinary arts, general knowledge ### Query Understanding Enhanced comprehension of: - Conversational and informal Italian query patterns - Technical terminology in Italian across domains - Italian semantic concepts and nuances - Complex multi-faceted questions in Italian ## Training Data The model was fine-tuned on a curated corpus of Italian semantic data, featuring high-quality triplets designed to capture semantic nuances across multiple domains. The dataset emphasizes: - **Hard negative mining**: Strategic inclusion of semantically related but incorrect documents - **Italian language focus**: Comprehensive representation of Italian language patterns - **Domain diversity**: Comprehensive coverage of academic, professional, and conversational contexts in Italian - **Quality curation**: Manual review and automated filtering for coherence and relevance ## Usage ### Basic Retrieval ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("DeepMount00/Ita-Search") # Italian query-document matching query = "Come si distingue una faglia trascorrente da una normale?" documents = [ "Le faglie trascorrenti sono caratterizzate da movimento orizzontale...", "Le faglie normali si verificano a causa di stress estensionale...", "Le strategie di gestione del portafoglio di investimenti..." ] query_embedding = model.encode(query, prompt="Represent this search query for finding relevant passages: ") doc_embeddings = model.encode(documents, prompt="Represent this passage for retrieval: ") similarities = model.similarity(query_embedding, doc_embeddings) ``` ### Prompt Templates The model is optimized for specific prompt templates: - **Queries**: `"Represent this search query for finding relevant passages: "` - **Documents**: `"Represent this passage for retrieval: "` ## Applications - **Italian information retrieval systems** - **Academic and technical document search in Italian** - **Italian question-answering platforms** - **Educational content recommendation for Italian speakers** - **Professional knowledge base systems in Italian** ## Limitations - **Language coverage**: Specifically optimized for Italian language - **Domain specificity**: Performance may vary on highly specialized domains not represented in training ## Acknowledgments This work builds upon the Qwen3-Embedding architecture and advances in contrastive learning for dense retrieval. We acknowledge the contributions of the Qwen team and the sentence-transformers community. --- **License**: Inherits licensing terms from the base Qwen/Qwen3-Embedding-0.6B model.