Maximizing WebRTC Potential: Unleashing Power in Android Apps with STUN, TURN Servers, ICE Candidates, and Signalling Server Integration

A Comprehensive Guide to WebRTC Voice over IP with STUN, TURN Servers, ICE Candidates, and Signalling Server Integration for a Calling Application

15 min readJun 12, 2023

Are you looking to develop a communication app similar to WhatsApp for seamless interaction within your inner circle? Look no further!

In this comprehensive guide, we’ll build a cutting-edge calling app that relies solely on internet connectivity. We will focus on WebRTC and its integration into an Android application, empowering you to facilitate high-quality voice and video calls over the web.

After completing this article, you will be able to:

Understand what Web RTC is?
How Web RTC helps in communication between peers?
How to create an Android application that fulfils this purpose?

What is Web RTC?

Web RTC (Web Real-Time Communication) is a peer-to-peer protocol that supports real-time communication. Still, we have seen gRPC Bi-Directional Stream and other tools like web-sockets and socket.io that enable us to communicate in real time.

Then why are we discussing a new protocol this time?

They all support real-time communication, e.g., text messages and file sharing. But they cannot transport audio and video via the Internet. The reason behind this is that size of both audio and video is huge. Hence, we need some compression mechanism to transport them over the network, commonly known as Codecs (Coder-Decoder). Codecs manage the size, quality, and efficiency of audio and video data transfer and storage. Web RTC hides the complexity of performing the steps for compression and decompression of audio and video from the developer so they can focus on the high-level requirements without worrying about the low-level implementations.

We have discussed why we need to familiarize ourselves with the Web-RTC protocol and where to use it.

“The idea of the new protocol Web-RTC was introduced in early 2012 by a team at Google. Finally, in 2021, we saw the major stable version.”

Now that we have covered the “Why?”, let’s move to the How part. Before we start on the “How?” we need to cover some prerequisites.

STUN and TURN Servers

Are you stunned? Is your inner developer mode turned on to create an app for yourself?

STUN stands for Session Traversal Utilities for NAT [Network Address Translation]. It is an integral part of WEB-RTC as it addresses the challenges posed by the NAT and helps establish a connection between the Peers. A STUN server performs a few steps to ensure a connection between the peers, including:

Discovery of the peers, i.e., it identifies the presence of the peers over the Internet and responds with the public IP and port.
Traversing the NAT is quite challenging. STUN helps in identifying the type of NAT the peer is behind and if there are any restrictions applied.

With the received public IP address of the peer, they try to establish a connection. If the peers are behind a symmetric NAT will follow a different method, e.g., sharing ICE candidates. Otherwise, peers connect and share their audio and video.

A symmetric NAT is a type of NAT that hinders identifying the public IP and port of the peer, as each outgoing connection from a device is assigned a unique IP and port combination in NAT.

TURN stands for Traversal Using Relays around NAT. It plays a vital role in Web-RTC when the peer is behind a restrictive NAT configuration like symmetric NAT. Where the STUN server fails in peer-to-peer communication, TURN provides a relay to facilitate peer-to-peer communication.

How does a TRUN server make it possible?

It enables the peers to connect using a relay.
Peers make a connection with the TURN server.
The TURN server receives requests from the peer and relays them to the destination.
It introduces an increased latency as now the peers have an intermediary server between them.
It offers an authentication and encryption mechanism which ensures only authorized devices are connected.

The connection finally establishes by sharing of iceCandidates between the peer.

What are iceCandidates?

They are network endpoints used by WebRTC to establish a connection between peers. An iceCandidate contains information about the IP address, port number and transportation protocol that can be used to establish a peer connection.

IceCantidates fall into two classes local and remote. Local candidate information is generated within the device and shared with the remote peer, which records it as the remote iceCandidate. The iceCandidate sharing is continued until a connection is established.

Note: A check is implemented during to process of sharing the iceCandidates called ICE connectivity check. This check verifies connectivity and determines the best available network path for communication.

Both STUN and TURN sound complex to implement with all the networking concepts. Need not worry! Both STUN and TURN servers are open-source and freely available.

Enough of the theory; let us look into the HOW part and start implementing the calling app.

What you need to know is a basic understanding of Node JS for server-side implementation and knowledge of Android [Java or Kotlin].

Android Implementation:

To start, we need some UI design. As I am not good at designing, thus have some basic layouts for you:

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <EditText
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:hint="username"
        android:layout_alignParentTop="true"
        android:layout_marginHorizontal="30dp"
        android:layout_marginTop="30dp"
        android:id="@+id/username" />
    <androidx.appcompat.widget.AppCompatButton
        android:layout_width="match_parent"
        android:layout_height="40dp"
        android:text="enter"
        android:id="@+id/enterBtn"
        android:layout_below="@id/username"
        android:layout_marginHorizontal="30dp"/>

</RelativeLayout>

This renders a single layout with a text input box for entering your username and an enter button to proceed to the next screen.

Next, we have our calling layout, which is a multi-layout structure:

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:keepScreenOn="true">

    <LinearLayout
        android:layout_width="match_parent"
        android:layout_height="80dp"
        android:id="@+id/incomingCallLayout"
        android:visibility="gone"
        android:orientation="horizontal"
        android:layout_alignParentTop="true">
        <TextView
            android:layout_width="0dp"
            android:layout_height="match_parent"
            android:gravity="center"
            android:layout_weight="6"
            android:textSize="20sp"
            android:textColor="@color/black"
            android:text="someone is calling"
            android:id="@+id/incomingNameTV"/>

        <ImageView
            android:layout_width="0dp"
            android:layout_height="match_parent"
            android:layout_weight="1.5"
            android:src="@drawable/ic_accept"
            android:padding="15dp"
            android:id="@+id/acceptButton"/>
        <ImageView
            android:layout_width="0dp"
            android:layout_height="match_parent"
            android:layout_weight="1.5"
            android:src="@drawable/ic_reject"
            android:padding="15dp"
            android:id="@+id/rejectButton"/>

    </LinearLayout>
    <RelativeLayout
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        android:layout_below="@+id/incomingCallLayout"
        android:id="@+id/whoToCallLayout">
        <EditText
            android:layout_width="match_parent"
            android:layout_height="40dp"
            android:hint="who to call ?"
            android:layout_margin="30dp"
            android:id="@+id/targetUserNameEt"/>
        <androidx.appcompat.widget.AppCompatButton
            android:layout_width="match_parent"
            android:layout_height="40dp"
            android:layout_below="@id/targetUserNameEt"
            android:layout_marginHorizontal="30dp"
            android:text="call"
            android:id="@+id/callBtn"/>
    </RelativeLayout>
    <RelativeLayout
        android:layout_width="match_parent"
        android:id="@+id/callLayout"
        android:visibility="gone"
        tools:visibility="visible"
        android:layout_height="match_parent">
        <org.webrtc.SurfaceViewRenderer
            android:id="@+id/remote_view"
            android:layout_width="match_parent"
            android:layout_height="match_parent" />

        <org.webrtc.SurfaceViewRenderer
            android:id="@+id/local_view"
            android:layout_width="120dp"
            android:layout_height="150dp"
            android:layout_above="@+id/controls"
            android:layout_marginStart="8dp"
            android:layout_marginTop="8dp"
            android:layout_marginEnd="8dp"
            android:layout_marginBottom="8dp"
            android:elevation="16dp" />
        <ProgressBar
            android:layout_centerInParent="true"
            android:id="@+id/remote_view_loading"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:indeterminate="true"
            app:layout_constraintBottom_toBottomOf="@id/remote_view"
            app:layout_constraintEnd_toEndOf="@id/remote_view"
            app:layout_constraintStart_toStartOf="@id/remote_view"
            app:layout_constraintTop_toTopOf="@id/remote_view" />
        <LinearLayout
            android:gravity="center"
            android:background="@drawable/curve_background"
            android:backgroundTint="@android:color/secondary_text_light"
            android:id="@+id/controls"
            android:orientation="horizontal"
            android:layout_centerHorizontal="true"
            android:layout_alignParentBottom="true"
            android:layout_width="match_parent"
            android:layout_height="wrap_content">
            <ImageView
                android:id="@+id/mic_button"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:layout_margin="16dp"
                android:clickable="true"
                android:focusable="true"
                android:padding="12dp"
                android:background="@drawable/circle_background"
                app:backgroundTint="@color/cardview_dark_background"
                app:srcCompat="@drawable/ic_baseline_mic_24" />
            <ImageView
                android:id="@+id/video_button"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:layout_margin="16dp"
                android:clickable="true"
                android:focusable="true"
                android:padding="12dp"
                android:background="@drawable/circle_background"
                app:backgroundTint="@color/cardview_dark_background"
                app:srcCompat="@drawable/ic_baseline_videocam_24" />
            <ImageView
                android:id="@+id/end_call_button"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:layout_margin="16dp"
                android:clickable="true"
                android:focusable="true"
                android:padding="12dp"
                android:background="@drawable/circle_background"
                app:backgroundTint="@android:color/holo_red_dark"
                app:srcCompat="@drawable/ic_baseline_call_end_24" />
            <ImageView
                android:id="@+id/switch_camera_button"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:layout_margin="16dp"
                android:clickable="true"
                android:focusable="true"
                android:padding="12dp"
                android:background="@drawable/circle_background"
                app:backgroundTint="@color/cardview_dark_background"
                app:srcCompat="@drawable/ic_baseline_cameraswitch_24" />
            <ImageView
                android:id="@+id/audio_output_button"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:clickable="true"
                android:focusable="true"
                android:layout_margin="16dp"
                android:padding="12dp"
                android:background="@drawable/circle_background"
                app:backgroundTint="@color/cardview_dark_background"
                app:srcCompat="@drawable/ic_baseline_speaker_up_24" />
        </LinearLayout>

    </RelativeLayout>


</RelativeLayout>

It includes a default layout that shows a text input to accept a name for the target user to call them presented by the id:

android:id="@+id/whoToCallLayout"

Next, we have an incoming call layout which contains the caller’s name, a button to accept the call, and another to reject the call presented by the id:

android:id="@+id/incomingCallLayout"

Last, we have the call layout that contains a remote video from the peer, a local video, and controls for the calls, e.g., mic and camera presented by the id:

android:id="@+id/whoToCallLayout"

Hence we are done with the UI design; we can now work on connecting the end.

First, we need the implemented logic for the main page where we take usernames.

binding = ActivityMainBinding.inflate(getLayoutInflater());
setContentView(binding.getRoot());
binding.enterBtn.setOnClickListener(view ->
        // getting user permission for Audio and Video Recording
        PermissionX.init(this).permissions(
                 Manifest.permission.RECORD_AUDIO,
                 Manifest.permission.CAMERA
        ).request((allGranted, grantedList, deniedList) -> {
            if (allGranted) {
                // if users grants permission to use both mic and 
                // camera then move to the next activity
                startActivity(
                       new Intent(this, CallActivity.class)
              .putExtra("username", binding.username.getText().toString()));
            } else {
                // if permission not granted update user to do so
                Toast.makeText(this, "Please Grant All Permissions",
                      Toast.LENGTH_LONG);
            }
}));

Before the next activity, we need to do some basic backend. We want it to accept a username and store it for future use.

// we need to start a small that opens a socket and listens to it
const http = require("http")
const Socket = require("websocket").server
const server = http.createServer(() => { })

const PORT = 3000

server.listen(PORT, () => {
    console.log(`Server Running At ${PORT}`)
})

const webSocket = new Socket({ httpServer: server })

// to store the informations about the users
const users = []

/*
* Incoming data will to of format 
* { type:String, name:String, target:String, data:any}
 */

webSocket.on('request', (req) => {
    const connection = req.accept()
    connection.on('message', (message) => {
        const data = JSON.parse(message.utf8Data)
        const user = findUser(data.name)
         switch (data.type) {
            case "store_user":
                if (user != null) {
                    // if username already exists
                    connection.send(JSON.stringify({
                      type: 'store_user_response'  
                      data: 'user already exists'
                    }))
                    return
                }
                // if username is unique create a new 
                // push to the users array
                const newUser = {
                    name: data.name, conn: connection
                }
                users.push(newUser)
                connection.send(JSON.stringify({
                      type: 'store_user_response'  
                      data: 'user added successfully'
                }))
                break
         }
    })
})


const findUser = username => {
    for (let i = 0; i < users.length; i++) {
        if (users[i].name === username)
            return users[i]
    }
}

This piece of code will handle the new user connection. When a new user enters the application, our backend system will store their information.

Now how do we connect to our backend web socket server?

To fulfil this, we will create a class on our Android side to manage the socket, i.e., reading and writing to and from a socket.

package com.app.callingapp;

import android.util.Log;

import com.app.callingapp.models.MessageModels;
import com.app.callingapp.utils.NewMessageInterface;
import com.google.gson.Gson;

import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;

import java.net.URI;

public class SocketRepository implements NewMessageInterface {
    
    // socket client
    private WebSocketClient webSocketClient;
    private String userName;
    private final String logTag = "SocketRepository";

    public SocketRepository(String username) {
        this.userName = username;

        // if you are using android emulator your local websocket address is 
        // going to be "ws://10.0.2.2:{PORT of backend}"
        // if you are using your phone as emulator your local address,
        // use cmd and then write ipconfig
        // and get your ethernet ipv4
        // but if your websocket is deployed you add your 
        // websocket address here
        
        // let PORT of backend be 3000

        try {
            this.webSocketClient = new 
                WebSocketClient(new URI("ws://10.0.2.2:3000")) {
                // as soon as we have a new connection ask server 
                // to store user information
                @Override
                public void onOpen(ServerHandshake handshake) {
                    sendMessageToSocket(
                            new MessageModels(
                                    "store_user", username, null, null
                            )
                    );
                }

                // managing when we have a new message on the socket
                @Override
                public void onMessage(String message) {
                    try {
                        onNewMessage(new Gson().fromJson(message, MessageModels.class));
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }

                // as soon as we close log 
                @Override
                public void onClose(int code, String reason, boolean remote) {
                    Log.d(logTag, "onClose: " + reason);
                }

                // case: we receive an error
                @Override
                public void onError(Exception ex) {
                    Log.d(logTag, "onError: " + ex.toString());
                }
            };

            // connect to the socket
            webSocketClient.connect();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    // managing writing on message on socket for server
    public void sendMessageToSocket(MessageModels message) {
        try {
            if (webSocketClient.getConnection() != null) {
                Log.d(logTag, "sendMessageToSocket: " + message);
                webSocketClient.send(new Gson().toJson(message));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    // how to manage new messages will implement later
    @Override
    public void onNewMessage(MessageModels message) {

    }
}

Did you observe we have used a helper class implementing the structure of our message models and a helper Interface? Let us peek into their definition.

public interface NewMessageInterface {
    void onNewMessage(MessageModels message);
}

public class MessageModels {
    public String type, name, target;
    // in kotlin you can use 'any' data type 
    // Object will replicate the any data type
    public Object data;

    public MessageModels(String type, String name, String target, Object data) {
        this.type = type;
        this.name = name;
        this.target = target;
        this.data = data;
    }
}

Now we need to implement the onNewMessage function. This function must be implemented in the class implementing the Calling UI, as we need to manipulate UI based on what messages we receive.

private void newMessage(MessageModels message) {
  if ("store_user_response".equals(message.type)) {
      if (message.data.toString().equals("user already exists")) {
        // user offline
        runOnUiThread(() ->
            Toast.makeText(CallActivity.this, 
                "Username already in use", 
                Toast.LENGTH_LONG).show()
                // calling finish to go back to previous activity
                finish();
             );
      }
      // other wise continue with this activity
    }
}

Now, we have successfully implemented the first part of the calling system. Next, we have to start working with our Web-RTC logic for the application, and simultaneously, we will also expand our backend.

First, we need to initialize our peer connection.

PS: Libraries we are using for the client side are:

// Web RTC Library 
implementation 'com.mesibo.api:webrtc:1.0.5'
// WebSocket Implementation
implementation 'org.java-websocket:Java-WebSocket:1.5.3'
// JSON parser
implementation 'com.google.code.gson:gson:2.9.1'
// Permission management
implementation 'com.guolindev.permissionx:permissionx:1.6.1'

// The specific value "WebRTC-H264HighProfile/Enabled" indicates that the
// "H.264 High Profile" video codec is enabled as part of a field trial. 
// H.264 is a widely used video compression standard, and the "High Profile" 
// refers to a specific set of encoding parameters that offer better 
// video quality at the expense of higher computational requirements.
PeerConnectionFactory.InitializationOptions peerConnectionOptions = 
              PeerConnectionFactory
                .InitializationOptions
                // application getApplication() function from the UI class
                // i.e. CallActivity
                .builder(application) 
                .setEnableInternalTracer(true)
                .setFieldTrials("WebRTC-H264HighProfile/Enabled/")
                .createInitializationOptions();
PeerConnectionFactory.initialize(peerConnectionOptions);

Here we have utilized the concept of a design pattern called a Factory Pattern. The Factor pattern is used to initialize an instance of any class or object without invoking it without calling the constructor function. Next, we will create a PeerConnectionFactory:

// here we will initialize the factory object with the Options
PeerConnectionFactory.Options peerConnectionFactoryOptions = new
PeerConnectionFactory.Options();
peerConnectionFactoryOptions.disableEncryption = true;
peerConnectionFactoryOptions.disableNetworkMonitor = true;
// We set the video Encoder and Decoder with Default values
return PeerConnectionFactory
        .builder()
        .setVideoEncoderFactory(new 
           DefaultVideoEncoderFactory(
             eglContext.getEglBaseContext(),
            true,
            true)
         )
        .setVideoDecoderFactory(new
           DefaultVideoDecoderFactory(
             eglContext.getEglBaseContext())
         )
        .setOptions(peerConnectionFactoryOptions)
        .createPeerConnectionFactory();
// finally we will create the PeerConnection from the
// PeerConnectionFactory

PeerConnection peerConnection = createPeerConnectionFactory().
                    createPeerConnection(iceServer, observer);

While we are finally initializing the PeerConnection from the PeerConnectionFactory, we are required to enter iceServer.

What are iceServers?

IceServers are a collection of links to STUN and TURN servers, as discussed earlier, which are special-purpose servers required while building a peer connection. We have used some public STUN and TURN servers. Web-RTC always tries to connect to the peer directly. It falls back to TURN servers.

// to create a list of server
iceServer = new ArrayList<>();
// stun server are open for public as they only track and return the 
// public address of the peer
iceServer.add(PeerConnection
.IceServer.builder("stun:stun.l.google.com:19302").createIceServer());
iceServer.add(new PeerConnection
.IceServer("stun:openrelay.metered.ca:80"));
// for turn server we need restrcited access as it consumes a lot of
// resources while relaying the data between the peer
// here first parameter is the username and second one is the password
iceServer.add(new PeerConnection
.IceServer("turn:openrelay.metered.ca:80", "openrelayproject", 
"openrelayproject"));
iceServer.add(new PeerConnection
.IceServer("turn:openrelay.metered.ca:443", "openrelayproject", 
"openrelayproject"));
iceServer.add(new PeerConnection
.IceServer("turn:openrelay.metered.ca:443?transport=tcp", 
"openrelayproject", "openrelayproject"));

By adding our iceServers, we have now established a WebRTC connection. Performing calls has never been easier!!.

To establish a call, we need a peer [another user]. How do we get a peer?

Remember, we wrote a Node JS server where till now, it had just one function, i.e. to store any incoming user in an array. This array contains information about the connected users hence the peers available and online. We will add another case to our existing system to handle a call request, i.e. search if the peer to available or online.

case "start_call":
let userToCall = findUser(data.target)
if (userToCall) {
    connection.send(JSON.stringify({
                        type: "call_response", data: "user is ready for call"
                    }))
} else {
    connection.send(JSON.stringify({
                        type: "call_response", data: "user is not online"
                    }))
}
break

This will update the caller that “Yes, the person you are calling is active”. Now we need to handle these cases on our client side. We will continue to add another condition in our onNewMessage function:

else if ("call_response".equals(message.type)) {
            if (message.data.toString().equals("user is not online")) {
                // user offline
                runOnUiThread(() ->
                        Toast.makeText(CallActivity.this, "User not Reachable", Toast.LENGTH_LONG).show()
                );
            } else {
                // user ready, we start the call
                runOnUiThread(
                        () -> {
// these functions refer to the update of layout for calling, 
// i.e. show the calling layout
                            setWhoToCallLayoutGone();
                            setCallLayoutVisible();
                            rtcClient.initializeSurfaceView(binding.localView);
                            rtcClient.initializeSurfaceView(binding.remoteView);
                            rtcClient.startLocalVideo(binding.localView);
                            rtcClient.call(binding.targetUserNameEt.getText().toString());
                        }
                );
            }
        }

What is the call function?

It will create an offer for our peer and then send it to the peer via the web socket server we created earlier.

// to call a peer in webrtc we need to present them an offer
// you can remeber in this case the secnario of getting a call on 
// whatsapp or any other platform you have options to accpet and reject
// this offer work more or less in the same way
public void call(String target) {
        MediaConstraints mediaConstraints = new MediaConstraints();
        mediaConstraints.mandatory.add(
                new MediaConstraints.KeyValuePair("OfferToReceiveAudio", "true"));
        mediaConstraints.mandatory.add(
                new MediaConstraints.KeyValuePair("OfferToReceiveVideo", "true"));

        peerConnection.createOffer(new SdpObserver() {
            @Override
            public void onCreateSuccess(SessionDescription sessionDescription) {
                peerConnection.setLocalDescription(new SdpObserver() {
                    @Override
                    public void onCreateSuccess(SessionDescription sessionDescription) {
                    }
// when our offer is created from the caller app we need to share this 
// information with the other user, here we will request our Web socket server 
// to share this information with the person we want to call.
                    @Override
                    public void onSetSuccess() {
                        HashMap<String, Object> offer = new HashMap<>();
                        offer.put("sdp", sessionDescription.description);
                        offer.put("type", sessionDescription.type);
// here we are sending our server with a new case of create_offer
                        socketRepository.sendMessageToSocket(new 
MessageModels("create_offer", username, target, offer));
                    }

                    @Override
                    public void onCreateFailure(String s) {
                    }

                    @Override
                    public void onSetFailure(String s) {
                    }
                }, sessionDescription);
            }

            @Override
            public void onSetSuccess() {

            }

            @Override
            public void onCreateFailure(String s) {

            }

            @Override
            public void onSetFailure(String s) {

            }
        }, mediaConstraints);
    }

Now we will handle the case of “create_offer”:

// We will search for the connection of the target user 
// using the findUser menthod and send them the offer with
// "offer_received" type in the message
case "create_offer":
                let userToReceiveOffer = findUser(data.target)

                if (userToReceiveOffer) {
                    userToReceiveOffer.conn.send(JSON.stringify({
                        type: "offer_received",
                        name: data.name,
                        data: data.data.sdp
                    }))
                }
                break

When the offer is shared with the peer, we will show them that someone is calling them, i.e. an option to accept or reject the incoming call request. We will expand our onNewMessage function to handle a new condition:

else if ("offer_received".equals(message.type)) {
            runOnUiThread(() -> {
// showing the new layout for accpeting or reject the call
                setIncomingCallLayoutVisible();
                binding.incomingNameTV.setText(new 
                StringBuffer(message.name + " is Calling You"));
                binding.acceptButton.setOnClickListener(view -> {
// case: user accepts the incoming call
                    setIncomingCallLayoutGone();
                    setCallLayoutVisible();
                    setWhoToCallLayoutGone();
                    rtcClient.initializeSurfaceView(binding.localView);
                    rtcClient.initializeSurfaceView(binding.remoteView);
                    rtcClient.startLocalVideo(binding.localView);
// create a session description for the remote peer 
                    SessionDescription sessionDescription = new 
                    SessionDescription(
                            SessionDescription.Type.OFFER,
                            message.data.toString()
                    );
// set the remote description and inform the rmeote peer for the accepted call
                    rtcClient.onRemoteSessionReceived(sessionDescription);
                    rtcClient.answer(message.name);
                    target = message.name;
                    binding.remoteViewLoading.setVisibility(View.GONE);
                });
// in case of reject you can implement some logic which we will skip for now
// e.g. inform the remote peer via websocket about call being rejected
                binding.rejectButton.setOnClickListener(view -> 
                          setIncomingCallLayoutGone());
            });
}

Let us look at the steps for answering the call:

First, we will set the remote session with the peer connection:

public void onRemoteSessionReceived(SessionDescription session) {
        peerConnection.setRemoteDescription(new SdpObserver() {
            @Override
            public void onCreateSuccess(SessionDescription sessionDescription) {
            }

            @Override
            public void onSetSuccess() {
                Log.d("TAGRTCCLient", "onRemoteSessionReceived: " + session);
            }

            @Override
            public void onCreateFailure(String s) {
            }

            @Override
            public void onSetFailure(String s) {
            }
        }, session);
}

2. Second, we will answer the call:

public void answer(String target) {
        MediaConstraints mediaConstraints = new MediaConstraints();
        mediaConstraints.mandatory.add(
                new MediaConstraints.
                    KeyValuePair("OfferToReceiveAudio", "true"));
        mediaConstraints.mandatory.add(
                new MediaConstraints.
                    KeyValuePair("OfferToReceiveVideo", "true"));


        peerConnection.createAnswer(new SdpObserver() {
            @Override
            public void onCreateSuccess(SessionDescription sessionDescription){
                peerConnection.setLocalDescription(new SdpObserver() {
                    @Override
                    public void onCreateSuccess(
                    SessionDescription sessionDescription) {
                    }

                    @Override
                    public void onSetSuccess() {
                        HashMap<String, Object> answer = new HashMap<>();
                        answer.put("sdp", sessionDescription.description);
                        answer.put("type", sessionDescription.type);

                        socketRepository.sendMessageToSocket(new
                        MessageModels("create_answer",
                                       username,
                                       target, 
                                       answer));
                    }

                    @Override
                    public void onCreateFailure(String s) {
                    }

                    @Override
                    public void onSetFailure(String s) {
                    }
                }, sessionDescription);
            }

            @Override
            public void onSetSuccess() {

            }

            @Override
            public void onCreateFailure(String s) {

            }

            @Override
            public void onSetFailure(String s) {

            }
        }, mediaConstraints);
    }

Now we will look into how to start the video and the further steps for answering:

// this function will initialize the camera video stream on the 
// local system like the small video we have 
public void startLocalVideo(SurfaceViewRenderer surfaceViewRenderer) {
        this.localVideoSource = this.createPeerConnectionFactory()
                                .createVideoSource(false);
        this.localAudioSource = this.createPeerConnectionFactory()
                .createAudioSource(new MediaConstraints());
        SurfaceTextureHelper surfaceTextureHelper =
 SurfaceTextureHelper.create(Thread.currentThread().getName(), 
                             eglContext.getEglBaseContext());
        videoCapturer = getVideoCapture(application);
        videoCapturer.initialize(
                surfaceTextureHelper,
                surfaceViewRenderer.getContext(),
                localVideoSource.getCapturerObserver()
        );
        videoCapturer.startCapture(320, 240, 30);
        localVideoTrack = createPeerConnectionFactory()
              .createVideoTrack("local_track", localVideoSource);
        localVideoTrack.addSink(surfaceViewRenderer);
        localAudioTrack = createPeerConnectionFactory()
              .createAudioTrack("local_track_audio", localAudioSource);
        MediaStream localMediaStream = createPeerConnectionFactory()
              .createLocalMediaStream("local_stream");
        localMediaStream.addTrack(localAudioTrack);
        localMediaStream.addTrack(localVideoTrack);

        peerConnection.addStream(localMediaStream);
}

// find the camera device here we are prioritizing finding the front camera 
private CameraVideoCapturer getVideoCapture(Application application) {
        Camera2Enumerator enumerator = new Camera2Enumerator(application);
        String[] deviceNames = enumerator.getDeviceNames();

        for (String deviceName : deviceNames) {
            if (enumerator.isFrontFacing(deviceName)) {
                return enumerator.createCapturer(deviceName, null);
            }
        }

        throw new IllegalStateException();
    }

Next, we have a new case for our web socket server, “create_answer”:

// we will search 
case "create_answer":
                let userToReceiveAnswer = findUser(data.target)
                if (userToReceiveAnswer) {
                    userToReceiveAnswer.conn.send(JSON.stringify({
                        type: "answer_received",
                        name: data.name,
                        data: data.data.sdp
                    }))
                }
                break

Previously, we set the remote peer details on the receiving end, now when we have received the answer to our call, we will have the peer details of the receiver on the caller end, which will to used to set the remote description:

By expanding our onNewMessage functionality, we will handle the “answer_received” case:

else if ("answer_received".equals(message.type)) {
            SessionDescription sessionDescription = new SessionDescription(
                    SessionDescription.Type.ANSWER,
                    message.data.toString()
            );
            rtcClient.onRemoteSessionReceived(sessionDescription);
            runOnUiThread(
                    () ->
                            binding.remoteViewLoading.setVisibility(View.GONE)
            );
        }

We will utilize the same onRemoteSessionReceived(sessionDescription), as defined earlier. Now, we save an iceCandidate from the remote peer. Now we will have a sequence of shares of iceCandidates until the connection is successfully established. We will handle this case of sharing candidates in our web socket server and the app by expanding the respective functions:

Server Side:

  case "ice_candidate":
                let userToReceiveIceCandidate = findUser(data.target)
                if (userToReceiveIceCandidate) {
                    userToReceiveIceCandidate.conn.send(JSON.stringify({
                        type: "ice_candidate",
                        name: data.name,
                        data: {
                            sdpMLineIndex: data.data.sdpMLineIndex,
                            sdpMid: data.data.sdpMid,
                            sdpCandidate: data.data.sdpCandidate
                        }
                    }))
                }
                break

App side:

else if ("ice_candidate".equals(message.type)) {
            IceCandidateModel receivingCandidate = gson.fromJson(
                    gson.toJson(message.data),
                    IceCandidateModel.class
            );
            rtcClient.addIceCandidate(
                    new IceCandidate(
                            receivingCandidate.sdpMid,
                            Math.toIntExact(
                    receivingCandidate.sdpMLineIndex.longValue()),
                            receivingCandidate.sdpCandidate
                    )
            );
        }

Finally, we have performed the last steps to establish a calling system by leveraging the WebRTC protocol.

Summary

To summarize the complete article, we can extract that there are a few compulsory when creating any such application for any platform:

We need a server where we can manage all the clients of our application. This server is called a signalling server as it signals one peer towards another.
We need a STUN and TURN server. We can use the public server or use a service called CoTurn. CoTurn helps in setting up a private STUN and TURN server.
We need to share the iceCandidates between the peer for connection.
We need everything over a secure platform, i.e. HTTPS. WebRTC cannot share iceCandidates over an insecure medium.
PS: for better management of your code base, you can make a separate class that implements all the functionality related to WebRTC, i.e. handling the call, creating the peer connection instances, etc.

Conclusion:

You can find the implementation on my GitHub:

Client Application: https://github.com/AbdullahJanKhan/webrtc-android
Signalling Server: https://github.com/AbdullahJanKhan/webrtc-signalling-server

Maximizing WebRTC Potential: Unleashing Power in Android Apps with STUN, TURN Servers, ICE Candidates, and Signalling Server Integration

A Comprehensive Guide to WebRTC Voice over IP with STUN, TURN Servers, ICE Candidates, and Signalling Server Integration for a Calling Application

What is Web RTC?

STUN and TURN Servers

What are iceCandidates?

Android Implementation:

Summary

Conclusion:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Abdullah Jan Khan

Responses (1)