Set Up Background Removal Using Chroma Keying

Hey, searching for a quick and easy way to add background removal to your Agora-based video calling application? Well, here you go!

Introduction

So what are we going to do?

Prerequisites

Project Setup

Then create a JS file and initialize a variable to store the App ID generated from the Agora Console:

Building a Basic User Interface

In addition, we need one divs where the streams will be displayed:

Note: transform: rotateY(180deg) is used to vertically flip the remote stream

Let’s also give the user an input field for setting tolerance, and a color picker:

Basic Terms

Track / MediaStreamTrack: A wrapper object around a byte stream generated by either hardware or software, which can contain, for example, video, audio, and screen share data

Stream / MediaStream: An object that can contain various tracks and a few event listeners and helper functions

You can read more at https://developer.mozilla.org/en-US/docs/Web/API/MediaStream.

Setting up AgoraRTC

If you are already familiar with the process and want to jump to the next section, we basically initialize the client with live mode and bind stream initializations, with only audio for the Stream button, and video and audio for the Join button, along with necessary Leave button bindings.

Let’s start by creating the Agora RTC client object with the mode set to live. You are free to use any codec you like, but for this project I will be using H264:

Since we will be working on streams in more than one function, we initialize some global variables into which we will store the initialized streams:

We initialize the client object and declare the error handling function, which will be reused multiple times in the project:

Now we add the event listeners and accompanying handler functions to define a basic behavior for the app:

Finally, we define the join and leave behaviors and bind them to the relevant buttons:

createStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track associated with it. We will be adding a video track to this stream in the next section.

Note: Since the focus of this tutorial is on background removal, I will not be touching on the fundamentals. You can read Building a Group Video Chat Web-App to gain a better understanding of the basic Agora Web SDK workflow and what exactly the above code snippet is doing.

With our basic setup complete, we can move on to the fun part:

Implementing Background Removal

To get the user video stream we first have to initialize it. We declare a new function that will handle keying for us. Inside it we call the necessary BrowserApi functions to get the user video stream:

getUserVideo() functions as the name suggests: It will give us the necessary media stream with the required video track. However, these tracks are raw data, and we need a way to be able to decode it. At the moment, the HTML5 canvas that we will be using to apply the background keying filter cannot decode such video streams. Hence, we will be an off-screen video element to convert data streams to video. These are intialized toward the beginning of the file as follows:

Let’s also initialize the canvas. This will be off-screen as we will be using the Agora SDK to display the final stream:

Returning to our keying function, now that we have the user video track, we can pass it to our video element and begin defining the necessary parameters. We also draw a base shape on our canvas and append it to the DOM:

We have a drawInterval here which is nothing but the time in milliseconds that the drawVideo function (responsible for drawing each frame of the video), which we will be touching on in the next section, will be called. It is taking the highest frames per second (FPS), so to get the time interval we simply do seconds per frame.

Now let’s turn to the core of our implementation, which is one function to rule them all: the drawVideo function. Here’s how it works.

First, we take a frame from the userStream from the video element and draw it on the canvas:

We then use the getImageData() function to get the image data from the canvas. This includes a data object that contains an array with all color values for each pixel. Iterate over this array in chunks of 4, each chunk representing the R, G, B, and A values for the pixel, respectively. We get the values for each color, check them against our filtering, and assign an alpha value of 0 if it exceeds our threshold. Finally, we use putImageData() to refill the canvas with our updated frame array:

Back in our keying function we call our drawVideo function in an interval, extract the resulting stream using capture stream, and add it to our global stream:

Finally, we call the function after client init under the Stream button behavior function:

Now let’s see if it actually works!

Testing

cd ./project-directoryLive-server .

Once we have the site open, we can input the channel name and username and click the Stream button, allowing for any needed permissions. We should now be able to see our stream. And now we can modify the tolerance and color values to achieve a desirable keying effect based on the video and lighting conditions.

Conclusion

Other Resources

I also invite you to join the Agora.io Developer Slack community.

I am a passionate web and golang developer with a keen interest real time web communications