Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ❌ Based on `decoder_model.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmp73cu4bel/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmp73cu4bel/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ❌ Based on `decoder_model.onnx` *without* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmp7pqh64tg/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmp7pqh64tg/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ✅ `fp16`: `encoder_model_fp16.onnx` (added)
↳ ✅ `q8`: `encoder_model_quantized.onnx` (added)
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ✅ `fp16`: `encoder_model_fp16.onnx` (added)
↳ ✅ `q8`: `encoder_model_quantized.onnx` (added)
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/onnxruntime-node@1.21.0/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ❌ Based on `decoder_with_past_model.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpmaahe0ic/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmpmaahe0ic/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ❌ Based on `decoder_with_past_model.onnx` *without* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpsrowz1_3/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmpsrowz1_3/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

Files changed (6) hide show

onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_fp16.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_quantized.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

onnx/encoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2e9afd5c068b30084262bc4399415216f34dae387ba55589d409b899758566a
+size 164073

onnx/encoder_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bdf73761fc49b24f437022e55adec9c1e89813a77c159ca497c7c88e27bf4621
+size 175141

onnx/encoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4495d6476d144ede07065cbbd9f8f7a109f08e9ec5c044a08f33cfe9551a30da
+size 183164

onnx/encoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8527469d7e5f87cac0bab3e62597b7f3184f9ef2a1a255e8ebc5f10b4cb1c0fa
+size 118285

onnx/encoder_model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abdbb9cd2803830817e20e2760f5bd52905b85db8dcecbb2a6a861de14088337
+size 179449

onnx/encoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abdbb9cd2803830817e20e2760f5bd52905b85db8dcecbb2a6a861de14088337
+size 179449