SentenceTransformer based on Alibaba-NLP/gte-multilingual-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-multilingual-base on the all-nli-tr dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-multilingual-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: tr

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Ve gerçekten, baba haklıydı, oğlu zaten her şeyi tecrübe etmişti, her şeyi denedi ve daha az ilgileniyordu.',
    'Oğlu her şeye olan ilgisini kaybediyordu.',
    'Baba oğlunun tecrübe için hala çok şey olduğunu biliyordu.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8966

Semantic Similarity

  • Datasets: sts-test, sts22-test, sts-dev-gte-multilingual-base, sts-test-gte-multilingual-base, sts-test, sts22-test, stsb-dev-768, stsb-dev-512, stsb-dev-256, stsb-dev-128, stsb-dev-64, stsb-test-768, stsb-test-512, stsb-test-256, stsb-test-128 and stsb-test-64
  • Evaluated with EmbeddingSimilarityEvaluator
Metric sts-test sts22-test sts-dev-gte-multilingual-base sts-test-gte-multilingual-base stsb-dev-768 stsb-dev-512 stsb-dev-256 stsb-dev-128 stsb-dev-64 stsb-test-768 stsb-test-512 stsb-test-256 stsb-test-128 stsb-test-64
pearson_cosine 0.8134 0.6514 0.8387 0.8134 0.8703 0.8697 0.8645 0.8591 0.8479 0.8455 0.8465 0.8443 0.8364 0.8235
spearman_cosine 0.82 0.6827 0.8428 0.82 0.8748 0.8753 0.8735 0.87 0.8656 0.8535 0.8554 0.855 0.8511 0.8461

Triplet

Metric Value
cosine_accuracy 0.9352

Training Details

Training Dataset

all-nli-tr

  • Dataset: all-nli-tr at daeabfb
  • Size: 482,091 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.51 tokens
    • max: 27 tokens
    • min: 6 tokens
    • mean: 10.47 tokens
    • max: 27 tokens
    • min: 0.0
    • mean: 2.23
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Bir uçak kalkıyor. Bir hava uçağı kalkıyor. 5.0
    Bir adam büyük bir flüt çalıyor. Bir adam flüt çalıyor. 3.8
    Bir adam pizzaya rendelenmiş peynir yayıyor. Bir adam pişmemiş pizzaya rendelenmiş peynir yayıyor. 3.8
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

all-nli-tr

  • Dataset: all-nli-tr at daeabfb
  • Size: 6,567 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 15.89 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 16.02 tokens
    • max: 49 tokens
    • min: 0.0
    • mean: 2.1
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Şapkalı bir adam dans ediyor. Sert şapka takan bir adam dans ediyor. 5.0
    Küçük bir çocuk ata biniyor. Bir çocuk ata biniyor. 4.75
    Bir adam yılana fare yediriyor. Adam yılana fare yediriyor. 5.0
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • warmup_steps: 144
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 144
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss all-nli-tr-test_cosine_accuracy sts-test_spearman_cosine sts22-test_spearman_cosine sts-dev-gte-multilingual-base_spearman_cosine sts-test-gte-multilingual-base_spearman_cosine stsb-dev-768_spearman_cosine stsb-dev-512_spearman_cosine stsb-dev-256_spearman_cosine stsb-dev-128_spearman_cosine stsb-dev-64_spearman_cosine stsb-test-768_spearman_cosine stsb-test-512_spearman_cosine stsb-test-256_spearman_cosine stsb-test-128_spearman_cosine stsb-test-64_spearman_cosine
0 0 - - 0.8966 0.8041 0.6694 - - - - - - - - - - - -
0.1327 1000 2.5299 3.3893 - - - 0.8318 - - - - - - - - - - -
0.2655 2000 2.1132 3.3050 - - - 0.8345 - - - - - - - - - - -
0.3982 3000 5.1488 2.7752 - - - 0.8481 - - - - - - - - - - -
0.5310 4000 5.4103 2.7242 - - - 0.8445 - - - - - - - - - - -
0.6637 5000 5.1896 2.6701 - - - 0.8451 - - - - - - - - - - -
0.7965 6000 5.0105 2.6489 - - - 0.8431 - - - - - - - - - - -
0.9292 7000 5.1059 2.6114 - - - 0.8428 - - - - - - - - - - -
1.0 7533 - - 0.9352 0.8200 0.6827 - 0.8200 - - - - - - - - - -
1.1111 200 34.2828 29.8737 - - - - - 0.8671 0.8671 0.8639 0.8606 0.8546 - - - - -
2.2222 400 28.038 28.8915 - - - - - 0.8740 0.8742 0.8720 0.8691 0.8648 - - - - -
3.3333 600 27.3829 29.3391 - - - - - 0.8747 0.8751 0.8728 0.8699 0.8653 - - - - -
4.4444 800 26.807 30.0090 - - - - - 0.8756 0.8761 0.8741 0.8710 0.8665 - - - - -
5.5556 1000 26.4543 30.5886 - - - - - 0.8753 0.8757 0.8739 0.8705 0.8662 - - - - -
6.6667 1200 26.0413 31.3750 - - - - - 0.8744 0.8751 0.8730 0.8698 0.8655 - - - - -
7.7778 1400 25.8221 31.6515 - - - - - 0.8752 0.8758 0.8739 0.8706 0.8661 - - - - -
8.8889 1600 25.6656 31.9805 - - - - - 0.8746 0.8752 0.8733 0.8700 0.8655 - - - - -
10.0 1800 25.5355 32.0454 - - - - - 0.8748 0.8753 0.8735 0.8700 0.8656 0.8535 0.8554 0.8550 0.8511 0.8461

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.3.1
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
21
Safetensors
Model size
305M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for newmindai/TurkEmbed4STS

Finetuned
(79)
this model
Finetunes
4 models

Dataset used to train newmindai/TurkEmbed4STS

Collection including newmindai/TurkEmbed4STS

Evaluation results