Logo image
Multimodal Speech Emotion Recognition in Patient-Clinician Interactions: Sentiment Analysis Leveraging Transformer Models
Conference proceeding   Peer reviewed

Multimodal Speech Emotion Recognition in Patient-Clinician Interactions: Sentiment Analysis Leveraging Transformer Models

Md Jobair Hossain Faruk, Md Kamrul Siam, Rafia Akter Romana, Sharif Ullah and Hossain Shahriar
2025 IEEE 7th International Conference on Sustainable Technologies For Industry 5.0 (STI)
International Conference on Sustainable Technologies For Industry 5.0 (STI), 7th (Dhaka, Bangladesh, 12/11/2025–12/12/2025)
12/11/2025

Metrics

1 Record Views

Abstract

In recent years, understanding the emotional dynamics of patient-clinician interactions has emerged as a critical topic in healthcare research. Speech Emotion Recognition (SER) provides critical insights to enhance patient care, diagnostic precision, and therapeutic effectiveness. In this paper, we present a text-based framework for Speech Emotion Recognition specifically designed for healthcare scenarios, integrating advanced transformer-based models including T5, BERT, and XLNet. Our proposed framework analyzes transcribed textual data, enabling the identification of potential emotions expressed by patients and healthcare providers. Audio recordings from interactions between patients and clinicians-including doctors and psychiatrists-are transcribed using the Whisper model, ensuring high transcription quality. We evaluated the framework's performance on a dataset comprising clinical conversations capturing a variety of emotional expressions relevant to healthcare contexts. Our experimental results demonstrate that our framework predicts six primary emotional states, including Happiness, Anger, Fear, Sadness, and Surprise, as well as distinguishing between positive and negative sentiments. Among the evaluated models, T5 exhibited the highest mean confidence score at89.12 % , significantly outperforming RoBERTa ( 78.44 % ) and XLNet ( 36.02 % ) in capturing emotional content from clinical dialogues. These findings highlight the potential of SER to aid healthcare professionals by providing deeper insights into patients' emotional states, supporting communication, and improving understanding of patients' sentiment.

Details

Logo image