标题(Title) 一般不超过80个字符
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotion
(75 个字符)
关键词(KeyWords)一般不超过100个字符
EMOVA, omni-modal large langauge models, vision-language models, speech models, emotions, end-to-end speech
(107 个字符)
描述(Description)一般不超过200个字符
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotion
(75 个字符)