For the past week, I was busy making visuals for an interactive musical piece that was performed at the ASU+GSV Summit in Salt Lake City, on May 8th, 2017. The production timeline was extremely tight and I lost a lot of sleep; nonetheless, the final render made its way onto the screen at the opening keynote.
Check it out here (skip to 56:16):
This (the live performance) is actually one half of what this project entailed. The live show performances uses video renders of the virtual environments; however I was also tasked with creating a VR environment, using Vive, which would place you in the middle of the scene, complete with surround audio and 3D video of the virtual performers.
Here’s a breakdown of all the elements that went into this project.
The Virtual Environment
I made the environment visuals in Unity, using purely procedural objects + animations. Most of it is reliant on some kind of noise function (I believe Simplex) for its behavior. So, the cube pillars raise in height based on noise, and the streamers follow a similar velocity field generated by noise. The Ocean generation is a little less intuitive, but it’s got to do with taking frequency content of actual observed ocean waves and replicating that through an inverse Fourier transform. The fog effects in the first piece are done using a ray-marching algorithm in a voxel field, which is capable of simulating light scattering from the various light sources. A good Vive experience requires that you render at 90+ fps, which means that performance has to be taken into heavy consideration, to avoid nausea. I was running this on a beefy GTX 1080 though.
3D Video
The performers are captured in a 360 camera rig covered by green screen, all facing inwards. Camera footage is keyed out and stitched together through an algorithm which approximates the depth of each pixel based on stereo correspondences. Using this, a volumetric field of color voxels is computed for each frame. This is further processed into triangle mesh, cleaned up, and then streamed from disk by a custom native Unity plugin which loads the data into a Mesh Renderer. The video data for each of the 3 performers totals up to around 30GB in disk space, more or less. Technical detail: a fairly capable SSD is required to stream this data in real time without lag.
Spatial Audio
The virtual environments were designed with spatial audio in mind, so each of the environments had an accompanying 3D audio mix, complete with moving point sources and ambisonic audio providing some reverb effects coming from the “horizon” (Ambisonic audio is basically a way to store a representation of audio from all different directions. It uses something called Spherical Harmonics. Youtube 360 videos now support first-order ambisonic audio. This is sort of analogous to linear x-y functions, compared to second order quadratic functions). (Tangentially, Spherical Harmonics are also used in real-time global illumination techniques, to store light contribution unto a single point from all different directions.) The spatial audio was imported and then animated in position inside of Unity, using the Google Spatial Audio SDK.
Live performance, and Interaction
In the live show, the singer is wearing a motion sensor on her hand. This essentially tracks changes in Pitch/Yaw/Roll. This data is being fed into the post-processing effects on the dancer/violinist videos, in the form of video trails (motion blur-y thing) and a general vertical blur (the stuff that kind of looks like J.J. Abrams anamorphic lens flare). We also used the Pitch/Roll data to pan the sound left/right/front/back around the venue that we performed in. That’s done through a fairly complex Max/MSP patch, sending values via OSC to both the audio set in Ableton, and the video set in Resolume.
I think that’s it. Overall a ton of cool effects and techniques, and a healthy venture into the possibilities of VR-designed music experiences. Wish I worked on it for longer.