BERT bert-uncased_L-4_H-128_A-2
This model is a PyTorch conversion of the original TensorFlow BERT checkpoint.
Model Details
- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
- Language: English (uncased)
- Architecture:
- Layers: 4
- Hidden Size: 128
- Attention Heads: 2
- Vocabulary Size: 30522
- Max Position Embeddings: 512
Model Configuration
{
"hidden_size": 128,
"hidden_act": "gelu",
"initializer_range": 0.02,
"vocab_size": 30522,
"hidden_dropout_prob": 0.1,
"num_attention_heads": 2,
"type_vocab_size": 2,
"max_position_embeddings": 512,
"num_hidden_layers": 4,
"intermediate_size": 512,
"attention_probs_dropout_prob": 0.1
}
Usage
from transformers import BertForPreTraining, BertTokenizer
# Load the model and tokenizer
model = BertForPreTraining.from_pretrained('bansalaman18/bert-uncased_L-4_H-128_A-2')
tokenizer = BertTokenizer.from_pretrained('bansalaman18/bert-uncased_L-4_H-128_A-2')
# Example usage
text = "Hello, this is a sample text for BERT."
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
Training Data
This model was originally trained on the same data as the standard BERT models:
- English Wikipedia (2500M words)
- BookCorpus (800M words)
Conversion Details
This model was converted from the original TensorFlow checkpoint to PyTorch format using a custom conversion script with the Hugging Face Transformers library.
Citation
@article{devlin2018bert,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1810.04805},
year={2018}
}
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support