AI filmmaking is not an experiment; it is already a production reality

Mark Venables

AI In Depth, AI for Enterprise, AI Solutions, Exclusives

Share this article

AI filmmaking is entering its studio era. Once defined by hallucinations, physics errors, and patchy lip-syncing, generative tools are maturing into controllable systems with digital consistency and creative direction at their core.

The film industry is poised for its most significant reconfiguration in a century. With the advent of sound, the rise of television, and the digital transition, each phase reshaped production, distribution, and economics. According to Haohong Wang, General Manager at TCL, the next evolution will be led by artificial intelligence, and its impact will be structural, not superficial.

TCL may be best known as one of the world’s largest consumer electronics brands, producing televisions, smartphones, and smart appliances on a global scale. But behind the hardware is a deep and expanding R&D capability, with TCL Research America increasingly focused on emerging technologies at the intersection of AI, vision systems, and immersive media. Through MineStudio, TCL is not simply experimenting with generative video. It is building a scalable infrastructure for AI-native filmmaking that restores directorial control and reshapes the economics of the production process.

“If you look at traditional filmmaking, most of the budget, sometimes 70 to 80 per cent, is spent on the production team,” Wang explains. “There are hundreds of people involved just in the physical shooting process. With AI, that structure inverts. The production team shrinks, efficiency rises, and we expect that soon, the cost of making AI-generated content will be less than ten per cent of what it is today.”

This inversion fundamentally changes who can afford to make films and what kinds of stories they can tell. Traditional budgets for episodic television range from $30,000 to $70,000 per minute of content. Wang expects that AI-native production could bring that figure below $5,000. That differential does not simply make content cheaper; it changes the creative calculus. Independent filmmakers, niche creators, and underrepresented voices gain access to studio-level production quality, bypassing the traditional gatekeepers.

More importantly, it enables a new business model that rethinks how content is monetised.

“We are not just replacing expensive shoots with cheap outputs,” Wang says. “AI enables a flywheel: creators make more content, audiences grow, advertisers follow the eyeballs, and their money flows back to support more original work. Once that loop becomes self-sustaining, most premium content may become free.”

This model aligns with broader shifts in media consumption. Streaming saturation, subscriber fatigue, and the rise of ad-supported platforms have already put pressure on the subscription video-on-demand (SVOD) model. AI-generated content, produced at scale and cost-effectively, could offer a viable alternative that preserves audience quality expectations while alleviating economic strain.

Why directability still matters

However, a scalable production model is only useful if it can deliver work that audiences want to watch. Generative video models have made remarkable progress in recent years, but they remain plagued by inconsistencies. A character might age between shots. A prop might change position. Faces warp, hands multiply, and eye-lines drift. These are not minor quirks; they are immersion-breaking flaws.

“Directors told us that even when the AI output looked good, they felt powerless,” Wang says. “They could not control camera angles, lighting, or even whether a character’s face stayed the same between shots. So we asked a different question: what if we rebuilt the entire process in a way that makes AI as controllable as a real film set?”

The result is MineStudio, TCL’s end-to-end AI filmmaking platform designed to return creative control to the director. At the heart of this workflow is a fundamental rethinking of what AI filmmaking should be. Rather than generating a film frame by frame based on vague textual prompts, MineStudio builds from a structured 3D environment using three core techniques: digitisation, decomposition, and compositing.

Digitisation is the process of converting all relevant creative assets, characters, objects, and backdrops into fixed, manipulable digital entities. This enables spatial and visual consistency. A digitised background remains identical across scenes, regardless of camera angle. A digital actor can be reused with variations in age, emotion, and lighting without requiring the entire shot to be re-rendered.

Decomposition breaks these digital assets into their controllable components. Facial expressions, motion cycles, limb positioning, environmental effects, camera rigging, and lighting angles can all be adjusted independently using AI tools. These parameters can be edited either manually or via learned behaviours extracted from real-world captures.

Compositing then brings these elements together. By layering the refined digital assets within a cinematic timeline and integrating audio, stylisation, and visual effects, the director effectively edits the film from a library of highly responsive components. This method does not replace traditional filmmaking so much as mimic its structure using digital-first tools.

“Once everything is digitised and decomposed, the creative team regains control,” Wang explains. “If a camera angle needs adjusting, it is just a virtual placement. If the lighting feels off, we can modify the source at the pixel level. If a character’s movement does not feel right, we extract motion data from a real actor and apply it to the digital version.”

From inconsistent frames to cinematic logic

This shift from 2D prompt-based generation to a spatially aware 3D framework is not cosmetic; it is transformational. Frame-by-frame models like those that dominate early generative video tools suffer because they lack context. Each image is generated in isolation, resulting in flicker, drift, and unpredictable visual noise. A spatially grounded approach ensures continuity across shots, enables logical blocking and choreography, and makes cinematic grammar possible again.

“Artists have always thought in three dimensions,” Wang adds. “The AI tools until now have worked in two. We needed to align the technology with how creators already visualise their stories.”

One of the enabling technologies for this spatial consistency is Gaussian splatting, a technique that builds 3D scenes from layered point clouds. It allows filmmakers to generate digital environments from real-world footage or static images. Combined with monocular depth estimation, directors can reconstruct entire sets from a few reference images and then manipulate them in real time.

MineStudio integrates this capability with support for LoRA (Low-Rank Adaptation) models, which allow for fine-tuning of style and behaviour on top of foundational generative models. These lightweight networks offer domain-specific conditioning, such as making a crowd move naturally, preserving the architectural style of a city, or ensuring a digital actor walks with the same gait across scenes.

“LoRA models are one of the ways we make the AI obey,” Wang continues. “They take general capabilities and make them specific to our film. That means you do not need to retrain a large model for every project. You just teach it how you want it to behave.”

Solving problems, AI does not understand

Not all challenges are artistic. Some are physical. Generative models still struggle with physics, speed, weight, gravity, and cause-effect realism. AI can hallucinate a person running fast, but the shadows will not match. A character might dive off a cliff with no sense of inertia or collision.

Wang’s solution is pragmatic. Rather than expecting AI to solve every problem, directors can hybridise the process, combining real captures with synthetic rendering to create accurate digital behaviours. “AI does not always understand what speed looks like or how gravity behaves,” Wang admits. “But if we show it a real clip of a person jumping into a pool, it learns what that motion should be. Then, we apply it to a digital double. That is how we bridge the gap between realism and imagination.”

This approach extends to lighting, a notoriously difficult aspect for AI models. Humans are highly attuned to small changes in lighting, especially on skin tones. MineStudio enables the definition of lighting sources within the 3D scene and allows for re-rendering of the scene algorithmically. AI tools recalculate lighting at the pixel level based on the orientation, colour temperature, and diffusion characteristics of each source. “There is a big difference between lighting that looks cinematic and lighting that looks wrong,” Wang explains. “By making light a controllable attribute, we eliminate one of the most immersion-breaking errors in AI film.”

Intimacy, interaction, and emotion

Another facet that generative tools continue to struggle with is emotion. Complex human scenes, especially those involving close dialogue, intimacy, or conflict, reveal the brittleness of AI models. Gaze drift, micro-expression failures, and subtle timing mismatches accumulate and rupture the viewer’s trust.

“Our eyes are susceptible to mistakes in human behaviour,” Wang continues. “If a character blinks too slowly, or their smile looks artificial, we notice. That is why drama is so hard. It mirrors everyday life, so we expect perfection.” Despite this, TCL has already begun producing short-form dramas using the MineStudio pipeline. Animations and stylised genres such as sci-fi and horror, where audiences are more forgiving of abstraction, have already achieved viability. The real frontier lies in long-form live-action drama, particularly in scenes with overlapping dialogue, close-up camera work, and complex social cues.

Wang believes this gap will close within 12 to 24 months. Improvements in multimodal pretraining, reinforcement learning with human feedback, and diffusion model fidelity are all converging to raise the emotional intelligence of generative models. In the meantime, TCL continues to experiment with human-in-the-loop workflows, where actors’ performances are captured and fed into AI-driven characters via pose estimation and facial landmark tracking. “Give us 12 to 24 months,” Wang says. “We are already close on lip sync, facial detail, and body mechanics. As those areas mature, we will be able to create drama that feels emotionally real.”

From concept to cinema at laptop scale

This is not speculation. TCL has already produced eight short films using MineStudio, spanning genres from animation and horror to documentary and romance. These were created in weeks, not months, by small teams using hybrid techniques that combined AI, live-action reference footage, and digital cinematography. One of the latest, Next Stop, Paris, was premiered in Los Angeles and made available online, providing a tangible example of the pipeline’s capabilities. “These were not just technical demos,” Wang says. “They were real stories, with real characters, screened to real audiences. And they worked.”

What MineStudio offers is not a replacement for creativity but an infrastructure for it. A digital production environment where assets are stable, workflows are modular, and directors can choose their blend of efficiency and fidelity. “There is no right answer,” Wang says. “It is about choice. If quality is your top priority, opt for traditional methods. If time or budget is limited, use the AI pipeline. Most likely, the future is hybrid, real and artificial working together.”

AI filmmaking is not an anomaly or a trend. It is a fundamental rewrite of the tools, timelines, and economics of production. Its most powerful feature is not automation; it is control. By providing directors with a virtual stage that offers tangible logic and flexible fidelity, MineStudio redefines what it means to make a film.

“Once advertisers see the engagement, they will pay to be there,” Wang concludes. “That is the final piece. When the flywheel spins fast enough, content becomes free, creators get paid, and audiences benefit. That is not science fiction. That is the model we are building.”