Set Up Background Removal Using Chroma Keying
Hey, searching for a quick and easy way to add background removal to your Agora-based video calling application? Well, here you go!
Introduction
In this tutorial, we explore a method to remove background from a video stream using the chroma keying technique via the Agora SDK, which helps deliver a more seamless and reliable presentation experience in a video calling application.
So what are we going to do?
Prerequisites
- Visual Studio Code or another IDE of your choice
- A simple web server ( I like to use Live Server )
- An understanding of HTML/CSS/JS
- An Agora developer account (How to Get Started with Agora)
Project Setup
Begin by creating an HTML file and sourcing the Agora Web SDK, using the CDN link:
Then create a JS file and initialize a variable to store the App ID generated from the Agora Console:
Building a Basic User Interface
We need a simple form to get the channel and username with two buttons: a Stream button for the presenter and a Join button for attendees:
In addition, we need one divs where the streams will be displayed:
Let’s also give the user an input field for setting tolerance, and a color picker:
Basic Terms
Since we will be interfacing with the underlying browser API for getting media feeds, it’s important to distinguish between a couple of similar terms to avoid confusion:
Track / MediaStreamTrack: A wrapper object around a byte stream generated by either hardware or software, which can contain, for example, video, audio, and screen share data
Stream / MediaStream: An object that can contain various tracks and a few event listeners and helper functions
You can read more at https://developer.mozilla.org/en-US/docs/Web/API/MediaStream.
Setting up AgoraRTC
We can now start working on initializing the Agora client.
If you are already familiar with the process and want to jump to the next section, we basically initialize the client with live mode and bind stream initializations, with only audio for the Stream button, and video and audio for the Join button, along with necessary Leave button bindings.
Let’s start by creating the Agora RTC client object with the mode set to live. You are free to use any codec you like, but for this project I will be using H264:
Since we will be working on streams in more than one function, we initialize some global variables into which we will store the initialized streams:
We initialize the client object and declare the error handling function, which will be reused multiple times in the project:
Now we add the event listeners and accompanying handler functions to define a basic behavior for the app:
Finally, we define the join and leave behaviors and bind them to the relevant buttons:
createStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track associated with it. We will be adding a video track to this stream in the next section.
Note: Since the focus of this tutorial is on background removal, I will not be touching on the fundamentals. You can read Building a Group Video Chat Web-App to gain a better understanding of the basic Agora Web SDK workflow and what exactly the above code snippet is doing.
With our basic setup complete, we can move on to the fun part:
Implementing Background Removal
Let’s first describe the behaviors of our offset and color picker input fields inside onchange listeners for each like so:
To get the user video stream we first have to initialize it. We declare a new function that will handle keying for us. Inside it we call the necessary BrowserApi functions to get the user video stream:
getUserVideo() functions as the name suggests: It will give us the necessary media stream with the required video track. However, these tracks are raw data, and we need a way to be able to decode it. At the moment, the HTML5 canvas that we will be using to apply the background keying filter cannot decode such video streams. Hence, we will be an off-screen video element to convert data streams to video. These are intialized toward the beginning of the file as follows:
Let’s also initialize the canvas. This will be off-screen as we will be using the Agora SDK to display the final stream:
Returning to our keying function, now that we have the user video track, we can pass it to our video element and begin defining the necessary parameters. We also draw a base shape on our canvas and append it to the DOM:
We have a drawInterval here which is nothing but the time in milliseconds that the drawVideo function (responsible for drawing each frame of the video), which we will be touching on in the next section, will be called. It is taking the highest frames per second (FPS), so to get the time interval we simply do seconds per frame.
Now let’s turn to the core of our implementation, which is one function to rule them all: the drawVideo function. Here’s how it works.
First, we take a frame from the userStream from the video element and draw it on the canvas:
We then use the getImageData() function to get the image data from the canvas. This includes a data object that contains an array with all color values for each pixel. Iterate over this array in chunks of 4, each chunk representing the R, G, B, and A values for the pixel, respectively. We get the values for each color, check them against our filtering, and assign an alpha value of 0 if it exceeds our threshold. Finally, we use putImageData() to refill the canvas with our updated frame array:
Back in our keying function we call our drawVideo function in an interval, extract the resulting stream using capture stream, and add it to our global stream:
Finally, we call the function after client init under the Stream button behavior function:
Now let’s see if it actually works!
Testing
To test, we can start a web server. I will be using the Live Server npm package, for which the command is:
cd ./project-directoryLive-server .
Once we have the site open, we can input the channel name and username and click the Stream button, allowing for any needed permissions. We should now be able to see our stream. And now we can modify the tolerance and color values to achieve a desirable keying effect based on the video and lighting conditions.
Conclusion
That’s all it takes to integrate a chroma keying filter into the Agora SDK. I feel the simplicity of the procedure highlights the flexibility and freedom of use that comes with the Agora Web SDK. Some optimizations can still be achieved. For example, for higher resolution video feeds, we can sample an array grid of 3x3 pixels rather than sampling every single pixel, reducing the loop iterations significantly but offering a less accurately keyed result.
Other Resources
- See the Agora API Reference docs for more information.
- You can find the demo project source code here
- And you can test a live demo here
I also invite you to join the Agora.io Developer Slack community.