Hierarchical BERT
Collection
Set of BERT models with Hierarchical attention pre-trained on conversational data to process multiple utterances at once
•
8 items
•
Updated
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
3.3688 | 0.7422 | 7500 | 3.1865 | 0.4371 |
3.0558 | 1.4844 | 15000 | 2.9189 | 0.4672 |
2.8314 | 2.2266 | 22500 | 2.6859 | 0.4975 |
2.6241 | 2.9688 | 30000 | 2.5454 | 0.5161 |