AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Listen now

Description

In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment. We release code and model weights at https://github.com/scutzzj/AniPortrait2024: Huawei Wei, Zejun Yang, Zhisheng Wanghttps://arxiv.org/pdf/2403.17694v1.pdf

More Episodes

See all »

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text...

Published 05/16/24

Papers Read on AI

Published 05/16/24

A decoder-only foundation model for time-series forecasting

Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting...

Published 05/14/24