How WePlay Studios delivered the five-hour VTuber Awards with a virtual broadcast blending physical production facilities and equipment with extensive virtual production engineering and design.
This is an interesting case study we had in from our friends at AJA. It's hard to imagine even large network broadcasters or giant organisations such as Eurovision pulling something like this off a few years ago.
We'll let AJA tell the story from here on in too, with a few light edits. Partly that's because there's some genuinely interesting tech detail in the way the project was set up, partly that's because we had a quick Google around the subject of VTubers and ended up in some extremely deep rabbit holes about subjects such as why the Japanese word kusa ('grass') is used by VTubers to denote laughter.*
WePlay Studios has dual headquarters in Kyiv, Ukraine, and Los Angeles, California, and cut its teeth in creating hugely popular esports shows. Those skills have proved eminently portable as well, and it's found there is also a huge demand for its talents on live event productions such as the inaugural VTuber Awards, which its team completed a full virtual production for last year.
Hosted by VTuber Filian in partnership with talent management agency Mythic Talent, the goal of the five-hour event was to celebrate the top virtual creators online. That was going to require a blend of real world production facilities and equipment with extensive virtual production engineering and design.
“Storytelling and technological innovation drive every show we do, and we pride ourselves on creating iconic content that leaves a lasting viewer impression; the VTuber Awards were no exception,” shared Head of Virtual Production Aleksii Gutiantov. “While we’d previously incorporated AR into live esports productions, this show marked our first foray into a fully virtual event managed with virtual cues; it’s the most challenging technological endeavor we’ve ever taken on.”
To successfully pull off the event, Gutiantov managed and coordinated the production in Los Angeles remotely from his laptop from Europe, using intercom communication with over 16 team members and orchestrating eight days of non-stop pre-production to deliver the broadcast. His team first created a real-time rendering of an entirely virtual Filian to incorporate into the live production using motion capture (mocap) technology. They tapped 20 witness cameras for comprehensive, full-body performance capture, including precise finger movements, and combined it with additional technology to stream facial mocap data.
The live event stream included a vast virtual arena, but Filian's character was located on a smaller stage, encircled by a digitally reconstructed version of WePlay's physical LA arena. To ensure every physical pan, tilt, and focus pull translated directly into the virtual render environment, WePlay Studios’ camera operators managed three cameras that were synced to virtual cameras. Camera operators in the practical/physical set were then able to switch among various angles within the virtual stadium using iPads connected to virtual cameras, creating the illusion of using a dozen cameras instead of three.
To make the production look more authentic, WePlay Studios connected the physical stage lights to the corresponding virtual lights, which allowed the team to manipulate the virtual stadium's lighting environment through the activation of a real environment via a lighting control console. Video playback was also integrated into the virtual world, with software for live event visuals connected to the virtual venue used to launch and control the graphics displayed on the virtual stage's screens. AJA KONA 5 video I/O boards played a crucial role in the 12G-SDI signal chain, and the final SDI feed was forwarded to an AJA KUMO 3232-12G video router for availability across the entire broadcast pipeline.
“Our KONA 5 cards were instrumental in allowing us to receive 12G-SDI signals, integrate them into an Unreal Engine 5 environment, and composite the final in SDI. It’s the best product on the market,” explained Gutiantov. “And, our KUMO routers let us build infrastructure for large remote and on-site productions like this one and manage everything from a single, convenient web interface thousands of kilometers away. We also love that we can save pre-programmed salvo routing configurations for SDI signals, and we never have to worry about them going down; I've been working with them since 2017 on various projects, and they've never failed me.”
KONA 5 enabled WePlay Studios’ team to leverage the power of Unreal Engine to create a comprehensive virtual production hub capable of handling 12G-SDI workflows. This allowed them to fully harness the potential of AR technology, from camera tracking to motion capture and data-driven graphics, while ensuring flawless live virtual production broadcasts without any technical mishaps in compositing. It also allowed them to produce UltraHD fill and key signals from one card in all known formats, using Pixotope as a keyer for 4K with the failover features known from FHD workflows.
“The KONA 5 user interface is simple enough to understand and control, even amidst the pressures of live production, and we love that we can preview last-minute changes in real time. It also offers up to four reconfigurable I/Os, from SD to 4K, along with support for AES/EBU, LTC, RS-422/GPI, which is key for transforming video from interlaced to progressive formats if we are working in Saudi Arabia or China,” adds Gutiantov. “KONA 5 really helps accelerate operations on projects like this, which requires a lot of compute power for motion-adaptive deinterlacing. Furthermore, the card’s multi-channel hardware processing accelerated compute-intensive operations so that we could combine multiple video sources into a single output in Unreal Engine 5, up/down/cross-scale, and mix/composite for all resolutions. These processes are essential for handling video content of any resolution, ensuring that the final output meets the broadcast quality standards.”
Due to the unique setup of WePlay Studios’ Los Angeles facility, the team developed a preview infrastructure comprising a series of Mini-Converters to facilitate 12G-SDI signal down conversion and forward 3G-SDI signals to their AJA KUMO video router. Using AJA HD5DA SDI distribution amplifiers, the team was then able to spread preview signals across all arena monitors for more straightforward management of all preview signals. The setup, which also used salvo routing configurations for SDI signals regardless of the data source's nature, enabled precise control over the view of the production that WePlay Studio provided to its partners, talent, camera operators, motion capture team, and the entire production crew at any moment. AJA ROI-DP DisplayPort to SDI Mini-Converters proved a key part of this preview infrastructure design, allowing the team to duplicate computer monitors into the broadcast pipeline to manage conversion with region-of-interest scaling.
It's interesting stuff and it's a field that's expanding all the time. WePlay Studios plans to open up a new virtual production studio in Los Angeles this year that will feature a screen area larger than 2,500 sq ft with a 1.8mm pitch and the first-ever Pantone-certified LED color pipeline, utilizing advanced flip-chip technology. And this will take its expertise and run with it into new areas such as film and entertainment projects beyond gaming and esports.
According to Gutiantov, the level of interactivity all this unlocks exciting new possibilities for live entertainment genres, blurring the lines between viewers and the virtual worlds we create. “WePlay is not just staying within the confines of the gaming industry; we're branching out to music and broader entertainment directions. We're currently in the early stages of planning and discussions for projects that straddle these new frontiers.”
*From The official Rice Digital totally seiso guide to VTuber lingo.
Literally, [Kusa means] grass, but is typically used to express laughter. There’s a fairly convoluted linguistic process to get to this point, which goes something like this.
In Japan, many Internet users do not use “lol” as English-speaking territories do; they use the letter “w”, which stands for “warai” (笑い, literally, “laughter”). Intense laughter is expressed by a long string of the letter — “wwwwwwwww” — which is regarded as looking like a row of grass. Hence, intense laughter is abbreviated as “kusa” (草, “grass”).