Emotion Recognition in Conversations is a multimodal classification task that classifies speakers’ emotions in different ’utterances’ of a conversation. One challenge in this task lies in using various data modalities, as they can conflict with each other or have intricate relationships that are hard to extract, and existing complex fusion networks can f ind it difficult to refine the representation of each modality. This paper explores the compatibility of the state of the art solution, the self-distillation transformer, with three fusion strategies: the Hierarchical Feature Fusion Network, the AutoEncoder and the Variational AutoEncoder. Experimental results show that the models perform within an error margin of less than 1Additionally, another challenge in the Emotion Recognition in Conversations task is the emotion shift, which occurs when a speaker changes emotions between one utterance and the next in a conversation. Adding an emotion shift modality that uses the Sentence-BERT module to detect these shifts produced similar results to those without the module, suggesting that the self-distillation transformer already extracts this information by itself.
| Date of Award | 20 Jul 2024 |
|---|
| Original language | American English |
|---|
| Supervisor | ERNESTO Damiani (Supervisor) |
|---|
- Emotion Recognition in Conversations
- Multimodal Classification
- Data Fusion
- Deep Learning
- Machine Learning
- Self-Distillation
Fusion Modules and the Self-Distillation Transformer in Emotion Recognition in Conversations
Alzaabi, H. (Author). 20 Jul 2024
Student thesis: Master's Thesis