Rzoro
/

Transformer_de_en_multi30K

Model card Files Files and versions

Transformer_de_en_multi30K / README.md

Rzoro's picture

Update README.md

32159c8 verified over 1 year ago

|

history blame contribute delete

3.42 kB

	---
	license: mit
	language:
	- de
	- en
	pipeline_tag: translation
	tags:
	- transformers
	- PyTorch
	- kaggle-dataset
	- Multi30K
	---
	# Model card for Transformer_de_en_multi30K

	## Model Description
	This project contains my work on building a transformer from scratch for an German-to-English translation. <br>
	This project uses <a href = "https://github.com/gordicaleksa/pytorch-original-transformer/tree/main">pytorch-original-transformer</a>
	work to understand the inner workings of the transformer and how to build it from scratch.
	Along with the implementation, we are referring to the <a href = "https://arxiv.org/abs/1706.03762">original paper</a> to study transformers.<be>


	## Model Details

	This model takes the following arguments as represented in the paper.

	```
	'dk': key dimensions -> 32,
	'dv': value dimensions -> 32,
	'h': Number of parallel attention heads -> 8,
	'src_vocab_size': source vocabulary size (German) -> 8500,
	'target_vocab_size': target vocabulary size (English) -> 6500,
	'src_pad_idx': Source pad index -> 2,
	'target_pad_idx': Target pad index -> 2,
	'num_encoders': Number of encoder modules -> 3,
	'num_decoders': Number of decoder modules -> 3,
	'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dkhdim_multiplier) -> 4,
	'pdropout': Dropout probability in the network -> 0.1,
	'lr': learning rate used to train the model -> 0.0003,
	'N_EPOCHS': Number of Epochs -> 50,
	'CLIP': 1,
	'patience': 5
	```
	We use Adam Optimizer along with CrossEntropyLoss to train the model.

	We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8

	## Usage

	Make sure to clone the repo and use the following code snippet to load the transformer model

	```python
	# torch packages
	import torch
	from model.transformer import Transformer
	import json

	if __name__ == "__main__":
	"""
	Following parameters are for Multi30K dataset
	"""
	# Load config containing model input parameters
	with open('params.json') as json_data:
	config = json.load(json_data)
	print(config)

	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	# Instantiate model
	model = Transformer(
	config["dk"],
	config["dv"],
	config["h"],
	config["src_vocab_size"],
	config["target_vocab_size"],
	config["num_encoders"],
	config["num_decoders"],
	config["dim_multiplier"],
	config["pdropout"],
	device = device)
	# Load model weights
	model.load_state_dict(torch.load('pytorch_transformer_model.pt',
	map_location=device))
	print(model)

	```
	### Source code

	Source code used to train the model is linked in this [github](https://github.com/m-np/pytorch-transformer)

	## Resources

	The following code is derived from the pytorch-original-transformer
	```
	@misc{Gordić2020PyTorchOriginalTransformer,
	author = {Gordić, Aleksa},
	title = {pytorch-original-transformer},
	year = {2020},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}},
	}
	```

	and using the following [blog](https://medium.com/@hunter-j-phillips/putting-it-all-together-the-implemented-transformer-bfb11ac1ddfe)