Update README.md
Browse files
README.md
CHANGED
@@ -81,6 +81,8 @@ python3 scripts/cross_tokenizer_distill.py \
|
|
81 |
name=gemma2_to_byte_20k
|
82 |
```
|
83 |
|
|
|
|
|
84 |
## Future Work
|
85 |
|
86 |
The current version of this model is trained for 20k steps with 32*2048 bytes per batch (= 1.3B bytes ≈ 328M subword tokens total). It was unexpected that it performs as well as it does with this very short training procedure. We plan to train a new version for more steps (you can also do so yourself using [`tokenkit`](https://github.com/bminixhofer/tokenkit)).
|
|
|
81 |
name=gemma2_to_byte_20k
|
82 |
```
|
83 |
|
84 |
+
Training took ~10 hours on a TPUv4-32.
|
85 |
+
|
86 |
## Future Work
|
87 |
|
88 |
The current version of this model is trained for 20k steps with 32*2048 bytes per batch (= 1.3B bytes ≈ 328M subword tokens total). It was unexpected that it performs as well as it does with this very short training procedure. We plan to train a new version for more steps (you can also do so yourself using [`tokenkit`](https://github.com/bminixhofer/tokenkit)).
|