Our research mainly involves understanding and automatic recognition of human affective and socio-relational constructs, such as emotions, empathy and rapport. Such technologies can be used to animate virtual humans or social robots. We aim to work towards technologies that do not replace but augment human-human interaction. We are also interested in studying the verbal and nonverbal behavior associated with mental health disorders and identify the successful strategies for psychotherapy through computational approaches.
Emotions play a central role in shaping our behavior and guiding our decisions. Affective Computing strives to enable machines to recognize, understand and express emotions. Advancing automatic understanding of human emotions involves technical challenges in signal processing, computer vision and machine learning. The major technical challenges in emotion recognition include the lack of reliable labels and large-enough databases. We have studied and evaluated machine-based emotion recognition methods from EEG signals, facial expression, pupil dilation and psychophysiological signals. Our past work demonstrated how behaviors associated cognitive appraisals can be used in an appraisal-driven framework for emotion recognition. We are also interested in understanding motivational components of emotions, including effort and action tendencies.
The emerging field of computational mental health aspires to leverage the recent advancements in human-centered artificial intelligence to assist human care in delivering mental health support. For example, automatic human behavior tracking can be used to train psychotherapists and detect behavioral markers associated with mental health. Identification of important behavioral biomarkers that can track the severity of mental health disorders and identify effective therapeutic behavior can assist the treatment and track the outcomes. We study and develop machine-based methods for automatic assessment of mental health disorders (PTSD, depression).
We also research and develop computational frameworks that to jointly analyze verbal, nonverbal, and dyadic behavior to better predict therapeutic outcome in motivation interviewing.
MultiSense is a framework developed to automatically recognize, track, and understand nonverbal behavior from both audiovisual data and interaction contexts Stratou and Morency, 2017. MultiSense is able to automatically analyze human behavior at different levels of granularity and abstraction, e.g., head pose, head node, and voice activity, and communicate the behavioral cues via a perception markup language called PML. We are developing a new generation of MultiSense powered by Microsoft Research Platform for Situated Intelligence (PSI) and open source components. MultiSense can be used for realtime behavior tracking, synchronized data collection and offline processing. MultiSense is being deployed to create interactive experiences with virtual humans.
Understanding social and relational from verbal and nonverbal behavior is another topic of interest. We are conducting research to understand the behavior associated with empathy, self-disclosure and rapport, to build nonverbal behavior generation for interactive virtual humans.
Understanding subjective multimedia attributes can help improve user experience in multimedia retrieval and recommendation systems. We have been active in developing new methods for computational understanding of subjective multimedia attributes, e.g., mood and persuasiveness. Development of such methods requires a thorough understanding of these subjective attributes to pursue the best strategies in creating databases and solid evaluation methodologies. We have developed multimedia databases and evaluation strategies through user-centered data labeling and evaluation at scale through crowdsourcing. We are also interested in attributes that are related to human perception, e.g., melody in music, comprehensiveness in visual content. Recent examples of this track of research include acoustic analysis for music mood recognition and a micro-video retrieval method with implicit sentiment association between visual content and language.