WildActor

Unconstrained Identity-Preserving Video Generation

1HKUST    2Meituan    3NUS
†Corresponding author
WildActor Teaser

WildActor: unconstrained human video generation under any-view condition.

Reference Identity

Prompt Sequence

Stage 1A blond male mountain biker poised on a narrow trail in a misty forest. Stage 2Crouches inspecting the front tire of his bike with a determined expression. Stage 3Begins to walk forward into the shadows, disappearing into the dense foliage.

Synthesized Video

Reference Identity

Prompt Sequence

Stage 1A young woman dressed in a black crop top... begins a slow 180-degree turn. Stage 2Mid-turn, she completes the rotation to show her back, showcasing the knot of her yellow sarong. Stage 3Returning fully to face camera, demonstrating perfect 3D consistency and character fidelity.

Synthesized Video

Reference Identity

Prompt Sequence

Stage 1A male skier wearing an orange hooded jacket stands atop a snowy peak. Stage 2Pushes off, carving smooth S-turns down the powdery slope. Stage 3Reaching a flatter section, the skier slows to a stop with a satisfied smile.

Synthesized Video

Reference Identity

Prompt Sequence

Stage 1A male hiker adjusting the straps of his olive green backpack in grasslands. Stage 2Begins to walk steadily up a gentle incline while the camera follows. Stage 3Reaching a scenic overlook, he unclips his backpack and retrieves a camera.

Synthesized Video

Reference Identity

Prompt Sequence

Stage 1In a dimly lit jazz club, the man walks onto the stage adjusting the stand. Stage 2Stands center stage tapping his brown boot rhythmically before performing. Stage 3Takes a deep breath and closes his eyes to soak in the atmosphere.

Synthesized Video

Reference Identity

Prompt Sequence

Stage 1A werewolf girl walks down a garden path admiring the flowers. Stage 2Crouches to inspect a rose, leopard print pants contrasting with greenery. Stage 3Stands and turns smiling softly as a gentle breeze blows through her hair.

Synthesized Video

Reference Identity

Prompt Sequence

Stage 1A bearded man leans over a workbench tracing a red wire. Stage 2Tests an electrical connection with a multimeter; sparks illuminate his face. Stage 3Steps back as a recliner chair begins to hum and recline automatically.

Synthesized Video

WildActor consistently preserves body identity (including facial features, body shape, and clothing details) under
diverse shot compositions, large viewpoint transitions, and substantial motions

Abstract

Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level consistency, or produce copy-paste artifacts where subjects appear rigid due to pose locking. We present Actor-18M, a large-scale human video dataset designed to capture identity consistency under unconstrained viewpoints and environments. Actor-18M comprises 1.6M videos with 18M corresponding human images, covering both arbitrary views and canonical three-view representations. Leveraging Actor-18M, we propose WildActor, a framework for any-view conditioned human video generation. We introduce an Asymmetric Identity-Preserving Attention (AIPA) mechanism coupled with a Viewpoint-Adaptive Monte Carlo Sampling strategy. Evaluated on the proposed Actor-Bench, WildActor consistently preserves full body identity under diverse shot compositions, large viewpoint transitions, and substantial motions, surpassing existing methods in these challenging settings.

Methodology

WildActor Method Architecture

Actor-18M Dataset

Actor-18M Pipeline

Actor-18M is the largest human-centric video dataset (1.6M videos / 18M images) constructed to capture view-invariant identity:

  • Core Extraction: Frames are filtered from identity-consistent videos to extract facial and body ground-truth references using Identity Stability and Motion Consistency checks.
  • Subset A (Viewpoint Augmentation): Augments viewpoint diversity using a multi-angle editing model, shifting the body distribution substantially toward non-frontal views.
  • Subset B (Attribute Augmentation): Produces references under diverse environments, lighting, and motions to prevent overfitting to background cues.
  • Subset C (Canonical Anchors): Generates front, side, and back views using visibility-guided selection to serve as complete identity anchors.

BibTeX

@article{guo2026wildactor,
  title={WildActor: Unconstrained Identity-Preserving Video Generation},
  author={Guo, Qin and Yang, Tianyu and He, Xuanhua and Shen, Fei and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Dan Xu},
  year={2026},
  journal={arXiv preprint}
}