Researchers from Meta and the University of Waterloo have introduced a new artificial intelligence system called MoCha, which generates fully-animated characters with synchronized speech and natural movements. The system is based on a diffusion transformer model with 30 billion parameters and creates HD video clips lasting about five seconds at a frame rate of 24 frames per second. MoCha uses a 'Speech and Video Window Attention' mechanism for precise lip synchronization, trained on 300 hours of carefully filtered video content. The system can also create multi-character scenarios using a simplified prompt system and focuses on close-ups and medium shots. Independent experts have recognized the generated videos as realistic, highlighting the high quality of natural movements and lip synchronization. MoCha allows users to reference characters through simple tags. The system can be used for creating digital assistants, virtual avatars, advertising, and educational content.