RishuD7 commited on
Commit
8851d73
·
verified ·
1 Parent(s): b667e36

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,657 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:8031
11
+ - loss:MultipleNegativesRankingLoss
12
+ base_model: BAAI/bge-base-en-v1.5
13
+ widget:
14
+ - source_sentence: '11.2 In addition, Whirlpool shall reimburse Supplier for Severance
15
+ incurred by Supplier as a result of the termination of employment of Supplier
16
+ employees working on Whirlpool''s account, subject to Section 11.3 below, upon
17
+ the following events: (a) if whirlpool requires Master Services Agreement Schedule
18
+ E (Compensation) 15#$#elimination of a position due to changes in scope or volume
19
+ of work; (b) if whirlpool approves of the elimination of a position proposed by
20
+ Supplier to create savings; or (c) following the disposition or closure of a Managed
21
+ Facility unless a similar Managed Facility is added within the same geographic
22
+ proximity of the Managed Facility that was disposed of or closed. ect to Severance:
23
+
24
+ (b) Pay to Supplier the following (collectively, the "Termination Expenses"),
25
+ as and. when determined: (i) Severance (as hereinafter defined) actually incurred
26
+ by Supplier as a result of the termination of employment of employees working
27
+ on Supplier''s account, subject to. Section 11.3 below, (ii) actual expenses reasonably
28
+ incurred by Supplier as a result of such expiration or early termination including,
29
+ without limitation, termination fees paid by Supplier pursuant to equipment and
30
+ vehicle leases and software licensing agreements, (iii) unamortized transition
31
+ costs incurred by Supplier, and (iv) other fees, expenses or reimbursements for
32
+ Services to which Supplier is entitled under this Agreement through the date of
33
+ expiration or termination, as the case may be.'
34
+ sentences:
35
+ - Assignment
36
+ - MOO_Services provided
37
+ - CBRE_Redundancy Detail
38
+ - source_sentence: 'Private and Confidential Adam Miller MOo Print Limited Unit 12,
39
+ Thames Gateway Park Chequers Lane Essex RM9 6FB 2 December 2021 Our ref:MOO Print
40
+ sOc2 Dear Adam SOC 2 Engagement#$#### 1 Introduction#$#### 1.1 This letter, together
41
+ with the enclosures (the "Engagement Letter"), sets out the basis on which we
42
+ are to provide professional services to Moo Print Limited (the "Engagement").
43
+ 2 Scope of Professional Services 2.1 Our role is to provide the professional services
44
+ detailed in the enclosed schedule(s) (the "Services"). By accepting these terms,
45
+ you are agreeing that they scope of the Services set out in the schedule(s) is
46
+ appropriate for your needs. We will perform the Services with reasonable skill
47
+ and care, but our duties and responsibilities shall be limited to the matters
48
+ as set out in the schedule(s). 2.2 (a) (b) reviewing (or otherwise being responsible
49
+ for) the services provided by any other professional advisers retained by you;
50
+ (c)
51
+
52
+ The fee for core services of this engagement will be between 42''50o - 55''ooo.
53
+ excluding VAT, invoiced on a monthly basis based on time spent each month. If
54
+ any additional costs or fees need to be incurred for travel, we will obtain written
55
+ approval from Moo Print Limited prior to incurring the expense. The 2.5% support
56
+ cost as stated in the terms of business section 2.1 will not apply to this engagement.
57
+ MOO Print Limited sOc2 02 December 2021 14'
58
+ sentences:
59
+ - CBRE_WCP Status Criteria
60
+ - MOO_Supply Category
61
+ - VAT_Data Protection Details
62
+ - source_sentence: '13.3 Default by Client. Each of the following shall constitute
63
+ a default by Client (an "Client Default'') under this Agreement: 13.3.1 Client
64
+ fails to make a payment when due to CBRE, and such failure continues for a period
65
+ of fifteen (15) days after written notice of such failure from CBRE; 13.3.2 Except
66
+ as set forth in the preceding clause 13.3.1, Client defaults in the performance
67
+ of or breaches any of its covenants, agreements or obligations under this Agreement
68
+ in any material respect, and such default or breach continues for fifteen (15)
69
+ days after written notice of such default or breach from CBRE, unless such default
70
+ cannot reasonably be cured within such 15-day period. in which event Client shall
71
+ have an additional thirty (30) days to cure such default, provided Client promptly
72
+ commences such cure within such 15-day period and continuously proceeds with such
73
+ cure in a diligent manner;
74
+
75
+ 13.1.2 If all or substantially all of the assets of CBRE are attached, seized,
76
+ or levied upon. or come into the possession of any receiver, trustee, custodian
77
+ or assignee for the benefit of creditors, and the same is not vacated, stayed,
78
+ dismissed, set aside or otherwise remedied within thirty (30) days after the occurrence
79
+ thereof: 13.1.3 If any petition is filed by or against CBRE under the United States
80
+ Bankruptcy. Code or any similar state or federal law (and, in the case of involuntary
81
+ proceedings, CBRE fails to cause the same to be vacated. stayed or set aside within
82
+ thirty (30) days after filing). 13.2 Remedies upon CBRE Default. Upon the occurrence
83
+ and continuance of an uncured CBRF Default, Client may terminate this Agreement
84
+ and/or exercise whatever remedies that are available at law or in equity.'
85
+ sentences:
86
+ - Governing Law
87
+ - CBRE_Termination Trigger - CBRE
88
+ - MOO_Services provided
89
+ - source_sentence: 'If termination is made by Buyer for convenience, an equitable
90
+ adjustment for completed Services and expenses and written authorized commitments
91
+ shallbe made by aqreement between termination expenses incurred by Seller, including
92
+ actual severance expenses incurred from the Agreement Term start date until Termination
93
+ and such severance expenses shall be no. greater than one week of each terminated
94
+ employee''s annual salary per year of service on Buyer''s account as documented
95
+ by Seller and pro-rated for any partial years a, unamortized transition costs
96
+ (amortized straight-line over the over the first three (3) years of the term of
97
+ the Agreement), and any actual termination fees incurred due to early termination
98
+ of vehicle and equipment leases "Termination Expenses) as Seller''s sole compensation.In
99
+ the event Buyer approves a reduction in Seller''s personnel dedicated to Buyer''s
100
+ account as part of a. savings initiative or any Seller personnel dedicated to
101
+ Buyer''s. account are terminated as a result of a reduction of Portfolio or scope
102
+ of Services, then Buyer shall review Seller''s business expenses caused by such
103
+ terminations. In the event a savings initiative (including a reduction of Portfolio
104
+ or scope of Services by Buyer), which explicitly calls out severance costs as
105
+ a part of the business case, is approved by Buyer and results in a termination
106
+ of Seller''s personnel dedicated to Buyer''s account then Buyer shall reimburse
107
+ any severance costs incurred by. Seller relating thereto; provided that Seller
108
+ shall first use best efforts to place the affected personnel in another position.Seller
109
+ shall in any event make best efforts in good faith to outplace any terminated
110
+ employees to avoid severance costs. If the termination is attributable to the
111
+ default or nonperformance of Seller and without limiting Buyer''s other rights
112
+ and remedies, Buyer shall not owe Seller any compensation after the effective
113
+ date of termination. Seller shall reimburse Buyer for any reasonable costs or
114
+ damages incurred; provided that Buyer shal have the duty to mitigate any damages.
115
+ A "default" shall mean a party''s failure to: (a) perform its duties and obligations
116
+ under the Contract Documents, or (b) observe and comply with the terms thereof. In
117
+ the event of a default by Buyer that is not cured within thirty (30) days of receipt
118
+ of notice from Seller, then Seller may terminate thisAgreement and Buyer shall
119
+ reimburse Seller for any reasonable costs or damages incurred and the Termination
120
+ Expenses. In addition, Seller will in good faith make efforts to outplace any
121
+ terminated employees to avoid severance costs. In addition, the above language
122
+ regarding reimbursement of termination expenses does not apply to any dark site
123
+ included in the SOw
124
+
125
+ ### PAYMENT TERMS#$#Buyer will pay all invoices net 45 days from receipt of invoice,
126
+ based on an accurate, valid and complete invoice in acceptable format, calculated
127
+ from the latest of the (i) Invoice Date (ii) Invoice Received Date, or (ii) Goods
128
+ Received Date. Seller may invoice for charges payable under this Agreement fifteen
129
+ (15) days prior to the Services being rendered. Such invoices shall be based upon
130
+ the budget approved by the parties in advance. On a quarterly basis, Seller shall
131
+ conduct a true up of the actual expenses incurred against the budgeted amounts
132
+ invoiced to Buyer. If there is a shortfall or overpayment during such quarter,
133
+ the difference shall be reflected as a separate line item on the next invoice.
134
+ If there are multiple line item shipments on one invoice, then the last shipment
135
+ received in Buyer''s system will be the Goods Received Date. If invoice is received
136
+ on a weekend or holiday, then Invoice Received Date will be the next business
137
+ day following the holiday and/or weekend. All invoices must be submitted within..
138
+ 180 calendar days of the latest of the 3 dates listed in this section to be considered
139
+ valid for payment. Buyer will not be responsible for payment of any amounts that
140
+ are not invoiced within one hundred eighty (180) days of the date that the goods
141
+ or services are supplied or provided to Buyer, and Seller hereby waives recovery
142
+ of such amounts. Any request for payment of taxes or reimbursement of taxes paid
143
+ by Seller on behalf of Buyer shall be billed within one year of the date of assignment,
144
+ transfer or conveyance to which such taxes apply. Under no circumstances shall
145
+ Buyer be responsible for any fines, interest or penalties assessed against Seller
146
+ due to Seller''s failure to pay such taxes, including as a result of Seller''s
147
+ failure to request reimbursement from Buyer.'
148
+ sentences:
149
+ - CBRE_Change in Law detail
150
+ - VAT_Arbitration Regulation
151
+ - CBRE_Redundancy Detail
152
+ - source_sentence: '(d) Names. Service Provider will not use the name of Company,
153
+ any Affiliate of Company, any Company employee or any employee of any Affiliate
154
+ of Company, or any product or service of Company or any of its Affiliates in any
155
+ press release, advertising or materials distributed to prospective or existing
156
+ customers, annual reports or any other public disclosure, except with Company''s
157
+ prior written authorization. Under no circumstances will Service Provider use
158
+ the logos or other trademarks of Company or any of its Affiliates in any such
159
+ materials or disclosures.
160
+
161
+ Service Provider Personnel shall, comply with any written instructions issued
162
+ by Company with respect to.. the use, storage and handling of the Company Materials.
163
+ Service Provider will use best efforts to protect the Company Materials from any
164
+ loss of or damage while such Company Materials are under Service Provider''s control,
165
+ which control will be deemed to begin upon receipt of the Company Materials by.
166
+ Service Provider; provided that Service Provider shall not be liable for any loss
167
+ or damage to Company. Materials to the extent such loss or damage is caused by
168
+ Service Provider''s compliance with such written. instructions.'
169
+ sentences:
170
+ - VAT_Confidentiality Remedies available
171
+ - Publicity
172
+ - CBRE_Termination Trigger - Client
173
+ pipeline_tag: sentence-similarity
174
+ library_name: sentence-transformers
175
+ metrics:
176
+ - cosine_accuracy@1
177
+ - cosine_accuracy@3
178
+ - cosine_accuracy@5
179
+ - cosine_accuracy@10
180
+ - cosine_precision@1
181
+ - cosine_precision@3
182
+ - cosine_precision@5
183
+ - cosine_precision@10
184
+ - cosine_recall@1
185
+ - cosine_recall@3
186
+ - cosine_recall@5
187
+ - cosine_recall@10
188
+ - cosine_ndcg@10
189
+ - cosine_mrr@10
190
+ - cosine_map@100
191
+ model-index:
192
+ - name: BGE base En v1.5 Phase 5
193
+ results:
194
+ - task:
195
+ type: information-retrieval
196
+ name: Information Retrieval
197
+ dataset:
198
+ name: dim 768
199
+ type: dim_768
200
+ metrics:
201
+ - type: cosine_accuracy@1
202
+ value: 0.006274509803921568
203
+ name: Cosine Accuracy@1
204
+ - type: cosine_accuracy@3
205
+ value: 0.023529411764705882
206
+ name: Cosine Accuracy@3
207
+ - type: cosine_accuracy@5
208
+ value: 0.03529411764705882
209
+ name: Cosine Accuracy@5
210
+ - type: cosine_accuracy@10
211
+ value: 0.07607843137254902
212
+ name: Cosine Accuracy@10
213
+ - type: cosine_precision@1
214
+ value: 0.006274509803921568
215
+ name: Cosine Precision@1
216
+ - type: cosine_precision@3
217
+ value: 0.007843137254901959
218
+ name: Cosine Precision@3
219
+ - type: cosine_precision@5
220
+ value: 0.007058823529411765
221
+ name: Cosine Precision@5
222
+ - type: cosine_precision@10
223
+ value: 0.0076078431372549014
224
+ name: Cosine Precision@10
225
+ - type: cosine_recall@1
226
+ value: 0.006274509803921568
227
+ name: Cosine Recall@1
228
+ - type: cosine_recall@3
229
+ value: 0.023529411764705882
230
+ name: Cosine Recall@3
231
+ - type: cosine_recall@5
232
+ value: 0.03529411764705882
233
+ name: Cosine Recall@5
234
+ - type: cosine_recall@10
235
+ value: 0.07607843137254902
236
+ name: Cosine Recall@10
237
+ - type: cosine_ndcg@10
238
+ value: 0.034339182667857376
239
+ name: Cosine Ndcg@10
240
+ - type: cosine_mrr@10
241
+ value: 0.02193370681605975
242
+ name: Cosine Mrr@10
243
+ - type: cosine_map@100
244
+ value: 0.03589435856389882
245
+ name: Cosine Map@100
246
+ ---
247
+
248
+ # BGE base En v1.5 Phase 5
249
+
250
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
251
+
252
+ ## Model Details
253
+
254
+ ### Model Description
255
+ - **Model Type:** Sentence Transformer
256
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
257
+ - **Maximum Sequence Length:** 512 tokens
258
+ - **Output Dimensionality:** 768 dimensions
259
+ - **Similarity Function:** Cosine Similarity
260
+ <!-- - **Training Dataset:** Unknown -->
261
+ - **Language:** en
262
+ - **License:** apache-2.0
263
+
264
+ ### Model Sources
265
+
266
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
267
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
268
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
269
+
270
+ ### Full Model Architecture
271
+
272
+ ```
273
+ SentenceTransformer(
274
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
275
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
276
+ (2): Normalize()
277
+ )
278
+ ```
279
+
280
+ ## Usage
281
+
282
+ ### Direct Usage (Sentence Transformers)
283
+
284
+ First install the Sentence Transformers library:
285
+
286
+ ```bash
287
+ pip install -U sentence-transformers
288
+ ```
289
+
290
+ Then you can load this model and run inference.
291
+ ```python
292
+ from sentence_transformers import SentenceTransformer
293
+
294
+ # Download from the 🤗 Hub
295
+ model = SentenceTransformer("RishuD7/bge-base-en-v1.5-82-keys-phase-7-exp_v1")
296
+ # Run inference
297
+ sentences = [
298
+ "(d) Names. Service Provider will not use the name of Company, any Affiliate of Company, any Company employee or any employee of any Affiliate of Company, or any product or service of Company or any of its Affiliates in any press release, advertising or materials distributed to prospective or existing customers, annual reports or any other public disclosure, except with Company's prior written authorization. Under no circumstances will Service Provider use the logos or other trademarks of Company or any of its Affiliates in any such materials or disclosures.\nService Provider Personnel shall, comply with any written instructions issued by Company with respect to.. the use, storage and handling of the Company Materials. Service Provider will use best efforts to protect the Company Materials from any loss of or damage while such Company Materials are under Service Provider's control, which control will be deemed to begin upon receipt of the Company Materials by. Service Provider; provided that Service Provider shall not be liable for any loss or damage to Company. Materials to the extent such loss or damage is caused by Service Provider's compliance with such written. instructions.",
299
+ 'Publicity',
300
+ 'CBRE_Termination Trigger - Client',
301
+ ]
302
+ embeddings = model.encode(sentences)
303
+ print(embeddings.shape)
304
+ # [3, 768]
305
+
306
+ # Get the similarity scores for the embeddings
307
+ similarities = model.similarity(embeddings, embeddings)
308
+ print(similarities.shape)
309
+ # [3, 3]
310
+ ```
311
+
312
+ <!--
313
+ ### Direct Usage (Transformers)
314
+
315
+ <details><summary>Click to see the direct usage in Transformers</summary>
316
+
317
+ </details>
318
+ -->
319
+
320
+ <!--
321
+ ### Downstream Usage (Sentence Transformers)
322
+
323
+ You can finetune this model on your own dataset.
324
+
325
+ <details><summary>Click to expand</summary>
326
+
327
+ </details>
328
+ -->
329
+
330
+ <!--
331
+ ### Out-of-Scope Use
332
+
333
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
334
+ -->
335
+
336
+ ## Evaluation
337
+
338
+ ### Metrics
339
+
340
+ #### Information Retrieval
341
+
342
+ * Dataset: `dim_768`
343
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
344
+
345
+ | Metric | Value |
346
+ |:--------------------|:-----------|
347
+ | cosine_accuracy@1 | 0.0063 |
348
+ | cosine_accuracy@3 | 0.0235 |
349
+ | cosine_accuracy@5 | 0.0353 |
350
+ | cosine_accuracy@10 | 0.0761 |
351
+ | cosine_precision@1 | 0.0063 |
352
+ | cosine_precision@3 | 0.0078 |
353
+ | cosine_precision@5 | 0.0071 |
354
+ | cosine_precision@10 | 0.0076 |
355
+ | cosine_recall@1 | 0.0063 |
356
+ | cosine_recall@3 | 0.0235 |
357
+ | cosine_recall@5 | 0.0353 |
358
+ | cosine_recall@10 | 0.0761 |
359
+ | **cosine_ndcg@10** | **0.0343** |
360
+ | cosine_mrr@10 | 0.0219 |
361
+ | cosine_map@100 | 0.0359 |
362
+
363
+ <!--
364
+ ## Bias, Risks and Limitations
365
+
366
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
367
+ -->
368
+
369
+ <!--
370
+ ### Recommendations
371
+
372
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
373
+ -->
374
+
375
+ ## Training Details
376
+
377
+ ### Training Dataset
378
+
379
+ #### Unnamed Dataset
380
+
381
+
382
+ * Size: 8,031 training samples
383
+ * Columns: <code>positive</code> and <code>anchor</code>
384
+ * Approximate statistics based on the first 1000 samples:
385
+ | | positive | anchor |
386
+ |:--------|:--------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
387
+ | type | string | string |
388
+ | details | <ul><li>min: 170 tokens</li><li>mean: 377.79 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 8.22 tokens</li><li>max: 11 tokens</li></ul> |
389
+ * Samples:
390
+ | positive | anchor |
391
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------|
392
+ | <code>In the event that the Contractor provides the Service during the incomplete period of the the service was provided 3. The Customer shall reimburse the Contractor for the expenses incurred for the purchase of spare parts, equipment and materials for the purpose of providing the Services increased by the Contractor's mark-up, the amount of which is specified in Appendix No 4 "Terms and Conditions". The purchase of spare parts, equipment and materials referred to above will take place after the Contractor's application has been accepted by the Customer. 1 Settlement for the undertaking by the Contractor Emergency interventions will take place in accordance with and the conditions indicated in Clause 4 "Terms and Conditions" 5. For the performance of additional works, the Contractor will receive the remuneration specified in the application or contract for the performance of additional works accepted by the Client.<br> Not later: within 7 (sleth) days from the date of termination of this Agre...</code> | <code>CBRE_Pricing Criteria</code> |
393
+ | <code>4.1 The Contractor, despite a written warning issued by the Contractor by registered mail, violates the provisions of the Agreement and #$#cease the infringement within 14 (fourteen) days from the date of receipt of the summons from the Contractor, unless, . due to the nature of the breach. its removal. requires a longer period and the actions to remedy the breach are taken immediately. and duly by the Contractor:. 4.4 The Contractor shall cease to perform the duties resulting from the contract in part or in part for more than 3 days. 5. The Contractor may terminate the Contract with effect from the date of written service - under pain of non-wai:noscj - a statement of termination. if:. 5.1The Customer shall not comply with the obligation to submit the seals after the deadline for the payment of the two consecutive settlement periods specified on the invoice and after the deadline of fourteen days specified by the Contractor in the. written reminder; out business activity<br>### S5 TER...</code> | <code>CBRE_Termination Trigger - Client</code> |
394
+ | <code>Works commissioned to the Contractor, which do not fall within the scope of the contract, are additionally valued as Additional Works after prior acceptance of the Contractor's offer within 30 days from the date of delivery of the duly issued invoice issued after the. completion of these works. 7. The amount of remuneration due as set out in Schedule No 4 "Terms and Conditions shall be the net amount and shall beand VAT and VAT.. 8. Any discounts, commissions and other bonuses that the Contractor receivesin connection with its global purchasing program will be retained by the Contractor and I wil not be subiect to settlement with the Principal. 9. In the event of changes in the Iaw resulting in an increase in costs related to the provision of Services on the part of the Contractor, the Customer undertakes to cover the above- mentioned costs, documented by the Contractor.<br>### S15 CONFIDENTIAL INFORMATION AND PROTECTION OF PERSONAL DATA 1.Any information obtained by the Customer or the C...</code> | <code>CBRE_WCP Status Criteria</code> |
395
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
396
+ ```json
397
+ {
398
+ "scale": 20.0,
399
+ "similarity_fct": "cos_sim"
400
+ }
401
+ ```
402
+
403
+ ### Training Hyperparameters
404
+ #### Non-Default Hyperparameters
405
+
406
+ - `eval_strategy`: epoch
407
+ - `per_device_train_batch_size`: 32
408
+ - `per_device_eval_batch_size`: 16
409
+ - `gradient_accumulation_steps`: 16
410
+ - `learning_rate`: 2e-05
411
+ - `num_train_epochs`: 30
412
+ - `lr_scheduler_type`: cosine
413
+ - `warmup_ratio`: 0.1
414
+ - `tf32`: False
415
+ - `load_best_model_at_end`: True
416
+ - `optim`: adamw_torch_fused
417
+ - `batch_sampler`: no_duplicates
418
+
419
+ #### All Hyperparameters
420
+ <details><summary>Click to expand</summary>
421
+
422
+ - `overwrite_output_dir`: False
423
+ - `do_predict`: False
424
+ - `eval_strategy`: epoch
425
+ - `prediction_loss_only`: True
426
+ - `per_device_train_batch_size`: 32
427
+ - `per_device_eval_batch_size`: 16
428
+ - `per_gpu_train_batch_size`: None
429
+ - `per_gpu_eval_batch_size`: None
430
+ - `gradient_accumulation_steps`: 16
431
+ - `eval_accumulation_steps`: None
432
+ - `torch_empty_cache_steps`: None
433
+ - `learning_rate`: 2e-05
434
+ - `weight_decay`: 0.0
435
+ - `adam_beta1`: 0.9
436
+ - `adam_beta2`: 0.999
437
+ - `adam_epsilon`: 1e-08
438
+ - `max_grad_norm`: 1.0
439
+ - `num_train_epochs`: 30
440
+ - `max_steps`: -1
441
+ - `lr_scheduler_type`: cosine
442
+ - `lr_scheduler_kwargs`: {}
443
+ - `warmup_ratio`: 0.1
444
+ - `warmup_steps`: 0
445
+ - `log_level`: passive
446
+ - `log_level_replica`: warning
447
+ - `log_on_each_node`: True
448
+ - `logging_nan_inf_filter`: True
449
+ - `save_safetensors`: True
450
+ - `save_on_each_node`: False
451
+ - `save_only_model`: False
452
+ - `restore_callback_states_from_checkpoint`: False
453
+ - `no_cuda`: False
454
+ - `use_cpu`: False
455
+ - `use_mps_device`: False
456
+ - `seed`: 42
457
+ - `data_seed`: None
458
+ - `jit_mode_eval`: False
459
+ - `use_ipex`: False
460
+ - `bf16`: False
461
+ - `fp16`: False
462
+ - `fp16_opt_level`: O1
463
+ - `half_precision_backend`: auto
464
+ - `bf16_full_eval`: False
465
+ - `fp16_full_eval`: False
466
+ - `tf32`: False
467
+ - `local_rank`: 0
468
+ - `ddp_backend`: None
469
+ - `tpu_num_cores`: None
470
+ - `tpu_metrics_debug`: False
471
+ - `debug`: []
472
+ - `dataloader_drop_last`: False
473
+ - `dataloader_num_workers`: 0
474
+ - `dataloader_prefetch_factor`: None
475
+ - `past_index`: -1
476
+ - `disable_tqdm`: False
477
+ - `remove_unused_columns`: True
478
+ - `label_names`: None
479
+ - `load_best_model_at_end`: True
480
+ - `ignore_data_skip`: False
481
+ - `fsdp`: []
482
+ - `fsdp_min_num_params`: 0
483
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
484
+ - `fsdp_transformer_layer_cls_to_wrap`: None
485
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
486
+ - `deepspeed`: None
487
+ - `label_smoothing_factor`: 0.0
488
+ - `optim`: adamw_torch_fused
489
+ - `optim_args`: None
490
+ - `adafactor`: False
491
+ - `group_by_length`: False
492
+ - `length_column_name`: length
493
+ - `ddp_find_unused_parameters`: None
494
+ - `ddp_bucket_cap_mb`: None
495
+ - `ddp_broadcast_buffers`: False
496
+ - `dataloader_pin_memory`: True
497
+ - `dataloader_persistent_workers`: False
498
+ - `skip_memory_metrics`: True
499
+ - `use_legacy_prediction_loop`: False
500
+ - `push_to_hub`: False
501
+ - `resume_from_checkpoint`: None
502
+ - `hub_model_id`: None
503
+ - `hub_strategy`: every_save
504
+ - `hub_private_repo`: False
505
+ - `hub_always_push`: False
506
+ - `gradient_checkpointing`: False
507
+ - `gradient_checkpointing_kwargs`: None
508
+ - `include_inputs_for_metrics`: False
509
+ - `eval_do_concat_batches`: True
510
+ - `fp16_backend`: auto
511
+ - `push_to_hub_model_id`: None
512
+ - `push_to_hub_organization`: None
513
+ - `mp_parameters`:
514
+ - `auto_find_batch_size`: False
515
+ - `full_determinism`: False
516
+ - `torchdynamo`: None
517
+ - `ray_scope`: last
518
+ - `ddp_timeout`: 1800
519
+ - `torch_compile`: False
520
+ - `torch_compile_backend`: None
521
+ - `torch_compile_mode`: None
522
+ - `dispatch_batches`: None
523
+ - `split_batches`: None
524
+ - `include_tokens_per_second`: False
525
+ - `include_num_input_tokens_seen`: False
526
+ - `neftune_noise_alpha`: None
527
+ - `optim_target_modules`: None
528
+ - `batch_eval_metrics`: False
529
+ - `eval_on_start`: False
530
+ - `eval_use_gather_object`: False
531
+ - `prompts`: None
532
+ - `batch_sampler`: no_duplicates
533
+ - `multi_dataset_batch_sampler`: proportional
534
+
535
+ </details>
536
+
537
+ ### Training Logs
538
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 |
539
+ |:----------:|:-------:|:-------------:|:----------------------:|
540
+ | 0.6375 | 10 | 2.4919 | - |
541
+ | 1.2749 | 20 | 1.576 | - |
542
+ | 1.7211 | 27 | - | 0.0285 |
543
+ | 1.1713 | 30 | 0.6111 | - |
544
+ | 1.8088 | 40 | 1.622 | - |
545
+ | 2.4462 | 50 | 0.4089 | - |
546
+ | 2.7012 | 54 | - | 0.0300 |
547
+ | 2.3426 | 60 | 0.7251 | - |
548
+ | 2.9801 | 70 | 0.864 | - |
549
+ | 3.6175 | 80 | 0.152 | - |
550
+ | 3.6813 | 81 | - | 0.0299 |
551
+ | 3.5139 | 90 | 0.7404 | - |
552
+ | 4.1514 | 100 | 0.5908 | - |
553
+ | 4.7251 | 109 | - | 0.0304 |
554
+ | 4.0478 | 110 | 0.1358 | - |
555
+ | 4.6853 | 120 | 0.7636 | - |
556
+ | 5.3227 | 130 | 0.3625 | - |
557
+ | 5.7052 | 136 | - | 0.0332 |
558
+ | 5.2191 | 140 | 0.2812 | - |
559
+ | 5.8566 | 150 | 0.6369 | - |
560
+ | 6.4940 | 160 | 0.1818 | - |
561
+ | 6.6853 | 163 | - | 0.0327 |
562
+ | 6.3904 | 170 | 0.3748 | - |
563
+ | 7.0279 | 180 | 0.5476 | - |
564
+ | 7.6653 | 190 | 0.0952 | - |
565
+ | 7.7291 | 191 | - | 0.0334 |
566
+ | 7.5618 | 200 | 0.5157 | - |
567
+ | 8.1992 | 210 | 0.4383 | - |
568
+ | **8.7092** | **218** | **-** | **0.0362** |
569
+ | 8.0956 | 220 | 0.1392 | - |
570
+ | 8.7331 | 230 | 0.5627 | - |
571
+ | 9.3705 | 240 | 0.2617 | - |
572
+ | 9.6892 | 245 | - | 0.0336 |
573
+ | 9.2669 | 250 | 0.2135 | - |
574
+ | 9.9044 | 260 | 0.5106 | - |
575
+ | 10.5418 | 270 | 0.1462 | - |
576
+ | 10.7331 | 273 | - | 0.0343 |
577
+ | 10.4382 | 280 | 0.2909 | - |
578
+ | 11.0757 | 290 | 0.4675 | - |
579
+ | 11.7131 | 300 | 0.075 | 0.0348 |
580
+ | 11.6096 | 310 | 0.4271 | - |
581
+ | 12.2470 | 320 | 0.3571 | - |
582
+ | 12.6932 | 327 | - | 0.0358 |
583
+ | 12.1434 | 330 | 0.1183 | - |
584
+ | 12.7809 | 340 | 0.4438 | - |
585
+ | 13.4183 | 350 | 0.1956 | - |
586
+ | 13.7371 | 355 | - | 0.0352 |
587
+ | 13.3147 | 360 | 0.1887 | - |
588
+ | 13.9522 | 370 | 0.4342 | - |
589
+ | 14.5896 | 380 | 0.1177 | - |
590
+ | 14.7171 | 382 | - | 0.0346 |
591
+ | 14.4861 | 390 | 0.2633 | - |
592
+ | 15.1235 | 400 | 0.4205 | - |
593
+ | 15.6972 | 409 | - | 0.0340 |
594
+ | 15.0199 | 410 | 0.0649 | - |
595
+ | 15.6574 | 420 | 0.4102 | - |
596
+ | 16.2948 | 430 | 0.3021 | - |
597
+ | 16.7410 | 437 | - | 0.0343 |
598
+ | 16.1912 | 440 | 0.1288 | - |
599
+ | 16.8287 | 450 | 0.4247 | 0.0343 |
600
+
601
+ * The bold row denotes the saved checkpoint.
602
+
603
+ ### Framework Versions
604
+ - Python: 3.11.11
605
+ - Sentence Transformers: 3.3.1
606
+ - Transformers: 4.43.1
607
+ - PyTorch: 2.5.1+cu124
608
+ - Accelerate: 1.3.0
609
+ - Datasets: 2.19.1
610
+ - Tokenizers: 0.19.1
611
+
612
+ ## Citation
613
+
614
+ ### BibTeX
615
+
616
+ #### Sentence Transformers
617
+ ```bibtex
618
+ @inproceedings{reimers-2019-sentence-bert,
619
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
620
+ author = "Reimers, Nils and Gurevych, Iryna",
621
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
622
+ month = "11",
623
+ year = "2019",
624
+ publisher = "Association for Computational Linguistics",
625
+ url = "https://arxiv.org/abs/1908.10084",
626
+ }
627
+ ```
628
+
629
+ #### MultipleNegativesRankingLoss
630
+ ```bibtex
631
+ @misc{henderson2017efficient,
632
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
633
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
634
+ year={2017},
635
+ eprint={1705.00652},
636
+ archivePrefix={arXiv},
637
+ primaryClass={cs.CL}
638
+ }
639
+ ```
640
+
641
+ <!--
642
+ ## Glossary
643
+
644
+ *Clearly define terms in order to be accessible across audiences.*
645
+ -->
646
+
647
+ <!--
648
+ ## Model Card Authors
649
+
650
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
651
+ -->
652
+
653
+ <!--
654
+ ## Model Card Contact
655
+
656
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
657
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.43.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.43.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c14af1d2f71d1193076c6112f2152b42edfe2d43ae5f561f245c1418da0e16e9
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff