benjamin commited on
Commit
63c5558
·
verified ·
1 Parent(s): 5427dc3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -81,6 +81,8 @@ python3 scripts/cross_tokenizer_distill.py \
81
  name=gemma2_to_byte_20k
82
  ```
83
 
 
 
84
  ## Future Work
85
 
86
  The current version of this model is trained for 20k steps with 32*2048 bytes per batch (= 1.3B bytes ≈ 328M subword tokens total). It was unexpected that it performs as well as it does with this very short training procedure. We plan to train a new version for more steps (you can also do so yourself using [`tokenkit`](https://github.com/bminixhofer/tokenkit)).
 
81
  name=gemma2_to_byte_20k
82
  ```
83
 
84
+ Training took ~10 hours on a TPUv4-32.
85
+
86
  ## Future Work
87
 
88
  The current version of this model is trained for 20k steps with 32*2048 bytes per batch (= 1.3B bytes ≈ 328M subword tokens total). It was unexpected that it performs as well as it does with this very short training procedure. We plan to train a new version for more steps (you can also do so yourself using [`tokenkit`](https://github.com/bminixhofer/tokenkit)).