electroglyph commited on
Commit
5adb99b
·
verified ·
1 Parent(s): cd55711

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +8 -2
  2. bigrams.txt +0 -0
  3. trigrams.txt +0 -0
README.md CHANGED
@@ -9,7 +9,7 @@ base_model: google/gemma-3-4b-it
9
 
10
  This is my first finetune. I used GRPO to reduce slop output.
11
 
12
- This is a LoRA adapter, it needs to be merged with google/gemma-3-4b-it.
13
 
14
  I'll also upload a Q4_K_M GGUF made with unsloth's imatrix.
15
 
@@ -21,12 +21,18 @@ I added some of these to the reward function and penalized their use.
21
 
22
  I also added some regex filters for comma overuse, and some sloppy phrasing, etc.
23
 
24
- Halfway thru traning I activate lexical diversity comparison. It penalizes MTLD < 100, gives increasing rewards up to 120.
25
 
26
  There's a callback for early stopping if reward stays high, but it didn't kick in this run.
27
 
 
 
28
  I'll probably keep iterating on this a bit, and may update this model.
29
 
30
  training code: [train.py](./train.py)
31
 
32
  I can't share my dataset, but here's an example of what it looks like: [dataset_example.json](./dataset_example.json)
 
 
 
 
 
9
 
10
  This is my first finetune. I used GRPO to reduce slop output.
11
 
12
+ This is a LoRA adapter, it needs to be merged with [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
13
 
14
  I'll also upload a Q4_K_M GGUF made with unsloth's imatrix.
15
 
 
21
 
22
  I also added some regex filters for comma overuse, and some sloppy phrasing, etc.
23
 
24
+ 200 steps into training I activate lexical diversity comparison. It penalizes MTLD < 100, gives increasing rewards up to 120.
25
 
26
  There's a callback for early stopping if reward stays high, but it didn't kick in this run.
27
 
28
+ This was trained on ~15 million tokens on a single 3090. I'm sharing my code so people can try their own finetuning runs.
29
+
30
  I'll probably keep iterating on this a bit, and may update this model.
31
 
32
  training code: [train.py](./train.py)
33
 
34
  I can't share my dataset, but here's an example of what it looks like: [dataset_example.json](./dataset_example.json)
35
+
36
+ Gemma 3 4b common bigrams, most common first: [bigrams.txt](./bigrams.txt)
37
+
38
+ Gemma 3 4b common trigrams, most common first: [trigrams.txt](./trigrams.txt)
bigrams.txt ADDED
The diff for this file is too large to render. See raw diff
 
trigrams.txt ADDED
The diff for this file is too large to render. See raw diff