|
--- |
|
license: mit |
|
language: |
|
- de |
|
- en |
|
pipeline_tag: translation |
|
tags: |
|
- transformers |
|
- PyTorch |
|
- kaggle-dataset |
|
- Multi30K |
|
--- |
|
# Model card for Transformer_de_en_multi30K |
|
|
|
## Model Description |
|
This project contains my work on building a transformer from scratch for an German-to-English translation. <br> |
|
This project uses <a href = "https://github.com/gordicaleksa/pytorch-original-transformer/tree/main">pytorch-original-transformer</a> |
|
work to understand the inner workings of the transformer and how to build it from scratch. |
|
Along with the implementation, we are referring to the <a href = "https://arxiv.org/abs/1706.03762">original paper</a> to study transformers.<be> |
|
|
|
|
|
## Model Details |
|
|
|
This model takes the following arguments as represented in the paper. |
|
|
|
``` |
|
'dk': key dimensions -> 32, |
|
'dv': value dimensions -> 32, |
|
'h': Number of parallel attention heads -> 8, |
|
'src_vocab_size': source vocabulary size (German) -> 8500, |
|
'target_vocab_size': target vocabulary size (English) -> 6500, |
|
'src_pad_idx': Source pad index -> 2, |
|
'target_pad_idx': Target pad index -> 2, |
|
'num_encoders': Number of encoder modules -> 3, |
|
'num_decoders': Number of decoder modules -> 3, |
|
'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4, |
|
'pdropout': Dropout probability in the network -> 0.1, |
|
'lr': learning rate used to train the model -> 0.0003, |
|
'N_EPOCHS': Number of Epochs -> 50, |
|
'CLIP': 1, |
|
'patience': 5 |
|
``` |
|
We use Adam Optimizer along with CrossEntropyLoss to train the model. |
|
|
|
We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8 |
|
|
|
## Usage |
|
|
|
Make sure to clone the repo and use the following code snippet to load the transformer model |
|
|
|
```python |
|
# torch packages |
|
import torch |
|
from model.transformer import Transformer |
|
import json |
|
|
|
if __name__ == "__main__": |
|
""" |
|
Following parameters are for Multi30K dataset |
|
""" |
|
# Load config containing model input parameters |
|
with open('params.json') as json_data: |
|
config = json.load(json_data) |
|
print(config) |
|
|
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
# Instantiate model |
|
model = Transformer( |
|
config["dk"], |
|
config["dv"], |
|
config["h"], |
|
config["src_vocab_size"], |
|
config["target_vocab_size"], |
|
config["num_encoders"], |
|
config["num_decoders"], |
|
config["dim_multiplier"], |
|
config["pdropout"], |
|
device = device) |
|
# Load model weights |
|
model.load_state_dict(torch.load('pytorch_transformer_model.pt', |
|
map_location=device)) |
|
print(model) |
|
|
|
``` |
|
### Source code |
|
|
|
Source code used to train the model is linked in this [github](https://github.com/m-np/pytorch-transformer) |
|
|
|
## Resources |
|
|
|
The following code is derived from the pytorch-original-transformer |
|
``` |
|
@misc{Gordić2020PyTorchOriginalTransformer, |
|
author = {Gordić, Aleksa}, |
|
title = {pytorch-original-transformer}, |
|
year = {2020}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}}, |
|
} |
|
``` |
|
|
|
and using the following [blog](https://medium.com/@hunter-j-phillips/putting-it-all-together-the-implemented-transformer-bfb11ac1ddfe) |