Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

Developed by: Luis Zúñiga
Model type: Text Classification
Language(s) (NLP): Spanish
License: Apache 2.0
Finetuned from model [optional]: dccuchile/bert-base-spanish-wwm-cased

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

The main purpose of this model is to serve as a text representation tool for text classification of social media publications (mainly tweets) related to boxing sport events in Spanish.

However, this only the first part of a multimodal model, where the image model is used to represent images and then, with a fusion method, text and images can be combined for a better representation of publications containing text and images.

Direct Use

This model can be used to directly assess the sentiment polarity of text elements of social media publications. However, the main usage is to be combined with the image model for multimodal sentiment analysis.

Bias, Risks, and Limitations

The dataset used to train this model contains tweets related to boxing events, so it does not specializes in a particular topic around the sport: it covers several news during, after, and before the event takes place.

How to Get Started with the Model

Use the code below to get started with the model.

model_ckpt = 'dccuchile/bert-base-spanish-wwm-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
text_feature_extractor = pipeline(task = 'text-classification', model = 'lzun/spanish-social-media-boxing-text', tokenizer = tokenizer)

Training Details

Training Data

The model is trained with the MSSAID dataset, specifically the text from tweets. The classes are positive (1), negative (-1), neutral (0) and spam (2). However, spam and neutral classes can be combined together to form a three class classification problem. The training data is not available due to its sensitive content, but can be shared upon reasonable request.

Evaluation

Metrics

As recommended, we use Balanced Accuracy as the main evaluation metric due to the data imbalanceness. However, we keep track of accuracy, MCC and weighted F1.

Results

Citation

Cite our paper if you found this useful:

BibTeX:

@inproceedings{Zuniga2022, author = {Luis N Zúñiga-Morales and Jorge Ángel González-Ordiano and J.Emilio Quiroz-Ibarra and Steven J Simske}, city = {Cham}, editor = {Obdulia Pichardo Lagunas and Juan Martínez-Miranda and Bella Martínez Seis}, isbn = {978-3-031-19496-2}, booktitle = {Advances in Computational Intelligence}, pages = {18-29}, publisher = {Springer Nature Switzerland}, title = {Impact Evaluation of Multimodal Information on Sentiment Analysis}, year = {2022} }

lzun
/

spanish-social-media-boxing-text