Emotion recognition is an emerging field of study that aims to develop intelligent systems capable of identifying, interpreting, and responding to human emotions. This thesis focuses on emotion recognition in peers’ conversations, specifically exploring the concepts of emotional climate (EC) and affect dynamics (AD) and their role in improving the accuracy and effectiveness of speech-based emotion recognition in real-world scenarios. EC refers to the joint emotional atmosphere created during peers’ conversation in a given context, while AD captures the subtle nuances of emotional expression in social interactions. Identifying the EC and AD is beneficial, particularly in contexts where emotional regulation and understanding are critical, such as in therapeutic settings. In this context, this thesis integrates machine learning (ML) and deep learning (DL) techniques to develop an efficient EC recognition from the peers’ conversational speech signals regarding emotional valence and arousal. To achieve this, efficient pre-processing and exploitation of the speech signals at the Mel Frequency Cepstral Coefficients (MFCCs) Spectrum and Third-Order Spectrum (Bispectrum, BS) are adopted, providing the bed set for applying ML/DL techniques. In particular, two distinct analysis paths are introduced, i.e., one that involves MFCC Spectrum and Bispectrum representation as images for feature extraction, fusing them with AD, resulting in an enriched feature vector inputted into ML classifiers, and another that involves inputting audio, MFCC Spectrum, and Bispectrum representation as images directly to a DL network, extracting deep features, and combining them with AD for the final EC classification. This thesis contributes to further introducing the EC combined with AD in peers’ conversations to understand the emotional dynamics during peers’ interactions. It extends the representation of the emotional content to the MFCC spectrum and Bispectrum domain to reveal the alterations in spectral distributions and capture the nonlinear/non-Gaussian characteristics of the speech signal, which are modulated by the emotional load. Furthermore, it proves that ML/DL techniques, when combined with these new representations, result in high classification performance that surpasses the current state-of-the-art. The proposed approaches can find many practical potentialities as they enhance our understanding of EC and AD in social interactions, which can have significant implications for various applications, including mental health therapy, education, and human-robot interaction.
| Date of Award | Aug 2023 |
|---|
| Original language | American English |
|---|
| Supervisor | Leontios Hadjileontiadis (Supervisor) |
|---|
- Emotion recognition in conversation
- Emotional climate
- Affect dynamics
- Machine learning
- Deep learning
- MFCC
- Bispectrum
Emotion Climate Recognition during Peers’ Conversation using Machine, Deep Learning, and Affect Dynamics
Alhussein, G. (Author). Aug 2023
Student thesis: Doctoral Thesis