Rzoro's picture
Update README.md
32159c8 verified
---
license: mit
language:
- de
- en
pipeline_tag: translation
tags:
- transformers
- PyTorch
- kaggle-dataset
- Multi30K
---
# Model card for Transformer_de_en_multi30K
## Model Description
This project contains my work on building a transformer from scratch for an German-to-English translation. <br>
This project uses <a href = "https://github.com/gordicaleksa/pytorch-original-transformer/tree/main">pytorch-original-transformer</a>
work to understand the inner workings of the transformer and how to build it from scratch.
Along with the implementation, we are referring to the <a href = "https://arxiv.org/abs/1706.03762">original paper</a> to study transformers.<be>
## Model Details
This model takes the following arguments as represented in the paper.
```
'dk': key dimensions -> 32,
'dv': value dimensions -> 32,
'h': Number of parallel attention heads -> 8,
'src_vocab_size': source vocabulary size (German) -> 8500,
'target_vocab_size': target vocabulary size (English) -> 6500,
'src_pad_idx': Source pad index -> 2,
'target_pad_idx': Target pad index -> 2,
'num_encoders': Number of encoder modules -> 3,
'num_decoders': Number of decoder modules -> 3,
'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4,
'pdropout': Dropout probability in the network -> 0.1,
'lr': learning rate used to train the model -> 0.0003,
'N_EPOCHS': Number of Epochs -> 50,
'CLIP': 1,
'patience': 5
```
We use Adam Optimizer along with CrossEntropyLoss to train the model.
We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8
## Usage
Make sure to clone the repo and use the following code snippet to load the transformer model
```python
# torch packages
import torch
from model.transformer import Transformer
import json
if __name__ == "__main__":
"""
Following parameters are for Multi30K dataset
"""
# Load config containing model input parameters
with open('params.json') as json_data:
config = json.load(json_data)
print(config)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Instantiate model
model = Transformer(
config["dk"],
config["dv"],
config["h"],
config["src_vocab_size"],
config["target_vocab_size"],
config["num_encoders"],
config["num_decoders"],
config["dim_multiplier"],
config["pdropout"],
device = device)
# Load model weights
model.load_state_dict(torch.load('pytorch_transformer_model.pt',
map_location=device))
print(model)
```
### Source code
Source code used to train the model is linked in this [github](https://github.com/m-np/pytorch-transformer)
## Resources
The following code is derived from the pytorch-original-transformer
```
@misc{Gordić2020PyTorchOriginalTransformer,
author = {Gordić, Aleksa},
title = {pytorch-original-transformer},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}},
}
```
and using the following [blog](https://medium.com/@hunter-j-phillips/putting-it-all-together-the-implemented-transformer-bfb11ac1ddfe)