Combining Video Streams Using the Agora Web SDK

7 min readMay 19, 2021

Hello there, weary web surfer! Have you spent countless hours trying to find a way to combine multiple video streams? Or do you just want to add a cool new feature to your Agora-based video calling app? brought here thanks to my SEO skills. No matter what brought you here I hope this tutorial provides a solution to your problem.

Introduction

In this tutorial, we explore a method to add a camera overlay to a screen-share feed and stream that as a single video track via the Agora SDK, which helps deliver a more seamless and reliable presentation experience in a video calling application.

So what are we going to do?

Prerequisites

VS Code or any other IDE of your choice
A simple web server — I like to use Live Server
An understanding of HTML/CSS/JS
Agora developer account (see How to Get Started with Agora)

Project Setup

Begin by creating an HTML file and sourcing the Agora Web SDK using the CDN link:

Then, create a JS file and initialize a variable to store the App ID generated from Agora console:

Building a Basic User Interface

We need a simple form to get the channel and username with two buttons: a Stream button for the presenter and a Join button for attendees:

In addition, we need two divs where the streams will be displayed:

Note: transform: rotateY(180deg) is used to vertically flip the remote stream

Basic Terms

Since we will be interfacing with the underlying browser API for getting media feeds, it’s important to distinguish between a couple of similar terms to avoid confusion:

Track / MediaStreamTrack — A wrapper object around a byte stream generated by either hardware or software which can contain, for example, video, audio, and screen share data

Stream / MediaStream — An object that can contain various tracks and a few event listeners and helper functions

Setting up AgoraRTC

We can now start working on initializing the Agora client.

If you are already familiar with the process and want to jump to the next section, we basically initialize the client with live mode and bind stream initializations, with only audio for the Stream button, and video and audio for the Join button, along with necessary leave button bindings.

Let’s start by creating the Agora RTC client object with the mode set to live. You are free to use any codec you like, but for this project I will be using H264:

Since we will be working on streams in more than one function, we initialize some global variables into which we will store the initialized streams

We initialize the client object as well as declaring the error handling function, which will be reused multiple times in the project:

Now we add the event listeners and accompanying handler functions to define a basic behavior for the app:

Finally, we define the join and leave behaviors and bind them to the relevant buttons:

createStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track, because that is what we will be adding in the next sectioncreateStream() initializes a Stream object. However, this is in the context of the Agora SDK and so differs slightly from the browser API. But logically it is the same entity with a similar function. The media stream is initialized with video and audio tracks when the Join button is clicked because that is intended for the audience. When the Stream button is clicked, a Stream is initialized without any video track, because that is what we will be adding in the next section..

Note: Since the focus of this tutorial is on stream multiplexing, I will not be touching on the fundamentals. You can read https://www.agora.io/en/blog/building-a-group-video-chat-web-app/ to gain a better understanding of the basic Agora Web SDK workflow and what exactly the above code snippet is doing.

With our basic setup complete, we can move on to the fun part:

Implementing video multiplexing

In order to multiplex the two streams, we have to first initialize them. We declare a new function that will handle the multiplexing for us. Inside it, we call the necessary browserAPI functions to get user video and screen streams:

getUserVideo() and getUserScreen functions as the name suggests will give us the necessary media streams with the required video tracks. However, these tracks are raw data, and we need a way to be able to decode them. At the moment, the HTML5 canvas that we will be using to merge the two streams cannot decode video streams. Hence we will be using two off-screen video elements to convert data streams to video. These are initialized toward the beginning of the file as follows:

Let’s also initialize the canvas we will be using to merge the two videos. This will also be off-screen because we will be using the Agora SDK to display the final stream:

At this point, you might wonder about the performance implications of having three off-screen elements. The answer is that making them off-screen does not magically remove the load. However, this practice is not completely redundant, because the majority of the workload consists of decoding the two video tracks, which would have to be done anyway to display them. and since they are off-screen with fixed positions, they don’t cause DOM repaints and effectively act as background video decoders. Some optimization can still be achieved, however, which we will touch on later in this tutorial.

Returning to our multiplexing function, now that we have the tracks we can pass them to our video elements and begin defining the necessary parameters. And we can draw a base shape on our canvas and append it to the DOM:

Yes, there are a lot of parameters. And yes, all of it is necessary. This diagram should help you understand what exactly is going on:

Besides the parameters, we have a drawInterval, which is the time in milliseconds that the drawVideo function (responsible for drawing each frame of the video) that we will be touching on in the next section will be called. It is taking the highest frames per second (FPS), so to get the time interval we simply do seconds per frame.

The scale factor determines the percentage of screen width the camera circle radius should take.

Now let’s turn to the core of our implementation, which is one function to rule them all, the drawVideo function. Here’s how it works.

First, we take a frame from the screenStream from the video element and draw it on the canvas. We then save this state:

Next, we draw an arc (circle) and use it to clip the subsequent camera stream. This gives us a circular area of the camera stream and a blank canvas everywhere else due to the clip function:

We then restore the initially saved state to restore the screen feed everywhere except where there is already something being drawn — that is, the circle camera feed:

Here is the code for that

Now, we call this function in the StreamMultiplexer function with the calculated interval to draw frames. The browser API offers a function to capture the canvas data and turn it into a video stream. Using the captureStream(), we pass the video track from the generated stream to the globalStream:

Finally calling the function after client init under the stream button behaviour function

Now to see if it actually works

Testing

To test, we start a web server. I will be using the Live Server npm package, for which the command is:

cd ./project-directoryLive-server .

Once we have the site open, we can input the channel name and username and click the Stream button, allowing for any needed permissions and selecting which screen we want to share. We should now be able to see our stream with the webcam overlay. We can duplicate our tab and click the Join button with a changed username to see that the stream is successfully being shared across the channel with the Agora SDK.

Conclusion

And that’s all it takes to build a presentation mode and integrate it into the Agora SDK, I feel it highlights the flexibility and freedom of use that comes with the SDK. Some optimizations can still be achieved. For example, in a more web app-like scenario an offscreenCanvas can be used in a worker that might help with performance to offload rendering away from the main thread. For more, see https://developers.google.com/web/updates/2018/08/offscreen-canvas.

Offloading video streams to a canvas opens doors to all sorts of video manipulations, the same concepts can be used to add chroma keying or custom backgrounds to your video calling app, the possibilities are endless and your power unlimited!

Other Resources

See the Agora Api Reference Docs for more information on the Agora sdk
You can find a live demo of the project here
You can find source code for the demo project here

I also invite you to join the Agora.io Developer Slack community.