Author: Li Chao, senior audio and video engineer, with many years of audio and video related development experience.

This article was first published in the RTC developer community, if you have any development problems, please click here to leave a comment to the author.

preface

I’ve written before about how to use WebRTC on Android. In that article, I showed you how to use WebRTC for audio and video calls on Android. Today, we will look at the iOS terminal 1 to 1 audio and video real-time call concrete implementation.

The implementation logic on iOS is basically the same as that on Android. The biggest difference may be the language difference. So, I’ll basically follow the same process as I did on Android to introduce the implementation on iOS. The specific steps are as follows:

  • Permission to apply for
  • Introducing the WebRTC library
  • Capture and display local video
  • Signaling drive
  • Create audio and video data channels
  • Media negotiation
  • Render remote video

This section describes how to use WebRTC on iOS.

To apply for permission

First, let’s take a look at how iOS gets access to audio and video devices. It is much easier to obtain permissions on iOS than on Android. The steps are as follows:

  • To open the project, click the project in the left directory.
  • Locate info.plist in the left directory and open it.
  • Click on the right where you see the + sign.
  • Add Camera and Microphone access.

Here’s a clearer picture of how to apply for permission:

Introducing the WebRTC library

There are two ways to introduce the WebRTC library on iOS:

  • The first is to compile the WebRTC library from the WebRTC source and then introduce it manually into the project.
  • The second way is that WebRTC official will regularly release compiled WebRTC library, we can use Pod to install.

In this project, we use the second approach.

Importing the WebRTC library in the second way is as simple as writing a Podfile. In your Podfile, you can specify where to download the WebRTC library and the name of the library you want to install.

The Podfile format is as follows:

source 'https://github.com/CocoaPods/Specs.git'
  
platform :ios,'11.0'

target 'WebRTC4iOS2' do

pod 'GoogleWebRTC'

end

Copy the code
  • Source, which specifies where the library files are downloaded from
  • Platform, which specifies the platform and platform version to use
  • Target, specifying the name of the project
  • Pod, specifying the library to install

Once you have your Podfile, run the pod install command in the current directory so that the POD tool can load the WebRTC library from the source.

After pod Install, in addition to downloading the library file, it will generate a new workspace file for us, which is **{project}.xcworkspace**. In this file, both the project file and the Pod dependency library you just installed are loaded and associated.

The WebRTC library is then successfully introduced. Now we can start writing our own code.

Obtaining local Video

With the WebRTC library successfully introduced, we can begin the real WebRTC journey. Now, let’s take a look at how to get local video and display it.

Before obtaining the video, we first need to choose which video equipment to use to collect data. In WebRTC, we can get all the video devices through the RTCCameraVideoCapture class. As follows:

NSArray<AVCaptureDevice*>* devices = [RTCCameraVideoCapture captureDevices];
AVCaptureDevice* device = devices[0];
Copy the code

With the above two lines of code, we have our first video device. Simple!!!!

Of course, equipment alone is not enough. We also need to know where the data we collect from the device goes, so we can display it.

WebRTC provides us with a specialized class called RTCVideoSource. It has two meanings:

  • One is to indicate that it is a video source. This is where we get the data when we show the video;
  • On the other hand, it is also a destination. That is, when we collect video data from a video device, we give it temporary storage.

In addition, in order to control the video device more easily, WebRTC provides a special class for operating the device, namely RTCCameraVideoCapture. Through it, we can freely control the video equipment.

With the two classes described above, and the AVCaptureDevice introduced earlier, we can easily capture video data. Now let’s take a look at the code.

In this code, first bind RTCVideoSource to RTCCameraVideoCapture, and then turn on the device, so that the video data will be continuously collected to RTCVideoSource.

. RTCVideoSource* videoSource = [factory videoSource]; capture = [[RTCCameraVideoCapturer alloc] initWithDelegate:videoSource]; . [capture startCaptureWithDevice:device format:format fps:fps]; .Copy the code

A few lines of code above capture video data from the camera.

One thing that needs to be emphasized here is the Factory object. In the WebRTC Native layer, factory can be said to be the “root of all things”. Objects such as RTCVideoSource, RTCVideoTrack and RTCPeerConnection need to be created through Factory. So how is the Factory object created?

You can find out with the following code:

. [RTCPeerConnectionFactory initialize]; // If the point-to-point factory is emptyif(! factory) { RTCDefaultVideoDecoderFactory* decoderFactory = [[RTCDefaultVideoDecoderFactory alloc] init]; RTCDefaultVideoEncoderFactory* encoderFactory = [[RTCDefaultVideoEncoderFactory alloc] init]; NSArray* codecs = [encoderFactory supportedCodecs]; [encoderFactorysetPreferredCodec:codecs[2]]; factory = [[RTCPeerConnectionFactory alloc] initWithEncoderFactory: encoderFactory decoderFactory: decoderFactory]; }...Copy the code

In the code above,

  • The initialize method of the RTCPeerConnectionFactory class is called.
  • Then create the Factory object. Note that when you create the Factory object, you pass in two parameters: the default encoder; One is the default decoder. We can modify these two parameters to use different codecs.

Now that we have the Factory object, we can start creating other objects. The next question, then, is how to present the captured video.

There is a big difference between displaying local videos on iOS and Android, which is mainly due to the different underlying implementation methods of different systems. In order to display local video more efficiently, they use different methods.

Displaying local videos on iOS is as simple as executing the following statement before calling Capture’s startCaptureWithDevice method:

self.localVideoView.captureSession = capture.captureSession;
Copy the code

Of course, in iOS page initialization, be sure to define localVideoView, its type is RTCCameraPreviewView!

Through the above steps, we can see the video images collected by the video equipment.

Signaling drive

Above, we introduced the application of iOS authority, the introduction of WebRTC library, and the collection and display of local video. These functions are very simple to implement. But the signaling we’re going to look at is a little more complicated.

In any system, signaling is the soul of the system. For example, who initiates the call; When and which SDP is sent during media negotiation is controlled by signaling.

For this project, its signaling is relatively simple, including the following signaling:

Client command

  • Join: A user joins a room
  • Leave, the user leaves the room
  • Message, end-to-end command (offer, answer, candidate)

Server command

  • Joined. The user has joined
  • Leaved, the user has left
  • Other_joined. Other users have joined
  • Bye, other users have left
  • The room is full

What is the relationship between these signals? What kind of signaling should be sent under what circumstances? To answer this question we need to look at the signaling state machine.

Signaling state machine

The signaling on the iOS side is managed by a signaling state machine, just like the JS side and Android side we introduced earlier. Different signaling needs to be sent in different states. Similarly, the status changes when a server or peer signaling is received. Let’s take a look at this state:

  • In init/ LEAVed state, the user can only send join messages. After receiving the Join message, the server returns the joined message. At this point, the client changes to the joined state.
  • injoinedIn state, the client has multiple options. When receiving different messages, the client will switch to different states:
    • If the user leaves the room, the client returns to its initial state, init/leaved.
    • If the client receives a second User Join message, it switches to join_CONN. In this state, two users can talk.
    • If the client receives a second User leave message, it switches to join_unbind. In fact, the join_unbind and joined states are basically the same.
  • If the client is in the JOIN_CONN state, when it receives a second User leave message, it also changes to joinED_unbind.
  • If the client is in joinED_unbind state, it will be switched to join_CONN state when it receives a second User Join message.

Through the above state diagram, we know very clearly what signaling should be sent in what state; In other words, what kind of signaling is sent and how the state changes.

Into the socket. IO library

Those of you who have read my previous articles know that I have been using the socket. IO library as the base for signaling both on the JS side and in real-time calls on the Android side. I chose socket.io

  • This is partly because it is cross-platform, so we can keep the same logic across all platforms.
  • On the other hand, socket. IO is simple to use and very powerful;

However, socket. IO on iOS is implemented in swift language, while our 1-to-1 system is implemented in Object-C. So, this raises a question, in OC (object-c) can directly use the library written by Swift?

The answer is yes. We just need to add use_frameworks to our Podfile! Command. So, our Podfile should now look like this:

source 'https://github.com/CocoaPods/Specs.git'
  
platform :ios,'11.0'

use_frameworks!

target 'WebRTC4iOS2' do

pod 'Socket.IO-Client-Swift'.'~ > 13.3.0'
pod 'GoogleWebRTC'

end
Copy the code

The meaning of each line in your Podfile is pretty clear, so I’m not going to go into it too much here.

Use of signaling

IO library with the socket. IO library successfully introduced, let’s see how to use socket. IO. There are three steps to using socket. IO on iOS:

  • The socket is obtained from the URL. With the socket, we can establish a connection to the server.
  • Registers the listened messages and binds a handler function to each listened message. When a message from the server is received, the bound function is then fired.
  • Establish a connection through the socket.
  • Send a message.

Let’s take a look at each of them one by one.

Get the socket

Getting a socket in iOS is actually very simple. Let’s look at the code:

NSURL* url = [[NSURL alloc] initWithString:addr];
manager = [[SocketManager alloc] initWithSocketURL:url
                                            config:@{
                                            	@"log": @YES,
    											@"forcePolling":@YES,
                                                @"forceWebsockets":@YES
                                                }];
socket = manager.defaultSocket;
Copy the code

Yeah, just three lines of code. I won’t explain why I wrote it that way, but you can write it down. This is the fixed format of socket. IO.

Register to listen for messages

It is also very easy to register a listening message using socket. IO, as shown below:

[socket on:@"joined" callback:^(NSArray * data, SocketAckEmitter * ack) {
    NSString* room = [data objectAtIndex:0];
    
    NSLog(@"joined room(%@)", room);
    
    [self.delegate joined:room];
    
}];
Copy the code

The above is to register a joined message and bind it to an anonymous handler. If the message has parameters, we can get them from the data array.

In the same way, if we want to register a new listening message, we can use the format above and just replace the joined.

Setting up a connection is even easier, with the following code:

[socket connect];
Copy the code

That’s right, that’s all it takes!

Sending messages Next, let’s look at how to send messages using socket.io.

.if(socket.status == SocketIOStatusConnected){
    [socket emit:@"join"with:@[room]]; }...Copy the code

Socket. IO uses the emit method to send messages. It can take a number of parameters, and these parameters are all put in one data. In the above code, the first step is to determine whether the socket has already handled the connection state. Only when the socket is in the connection state can the message actually be sent.

This is how socket. IO works, isn’t it very simple?

Create a RTCPeerConnection

After the signaling system is established, the following logic is built around the signaling system. The establishment of the RTCPeerConnection object is no exception.

On the client side, to talk to the remote end, the user must first send a join message, which means entering the room first. If the server determines that the user is valid, it sends a joined message to the client.

After receiving the joined message, the client needs to create an RTCPeerConnection. That is, it needs to establish an audio and video data transmission channel to communicate with the remote end.

RTCPeerConnection: RTCPeerConnection: RTCPeerConnection

.if(! ICEServers) { ICEServers = [NSMutableArray array]; [ICEServers addObject:[self defaultSTUNServer]]; } RTCConfiguration* configuration = [[RTCConfiguration alloc] init]; [configurationsetIceServers:ICEServers]; RTCPeerConnection* conn = [factory peerConnectionWithConfiguration:configuration constraints:[self defaultPeerConnContraints] delegate:self]; .Copy the code

There are three parameters for the RTCPeerConnection object for iOS:

  • The first is an object of type RTCConfiguration, in which the most important field is iceservers. It contains the STUN/TURN server address. It is mainly used for NAT traversal. For NAT traversal knowledge you can learn by yourself.
  • The second argument is the RTCMediaConstraints object, which limits the RTCPeerConnection. For example, do you want to receive video data? Whether to receive audio data? To communicate with the browser, turn on the DtlsSrtpKeyAgreement option.
  • The third parameter is the drop type. This is equivalent to setting an observer for RTCPeerConnection. This allows RTCPeerConnection to notify the observer of a status/information through it. But it’s not the observer model, and I want you to know that.

Now that the RTCPeerConnection object is created, we will introduce the most important part of the whole live call process, which is media negotiation.

Media negotiation

First of all, we need to know that the content of media negotiation is SDP protocol, students who do not know this part of the knowledge can learn by themselves. Secondly, we should be clear about the overall media negotiation process.

The media negotiation process on iOS is the same as that on Android/JS. Here’s the classic picture:

Offer
setLocalDescription
Offer

Next, the Offer is sent to the server. The call is then forwarded to the called party through the signaling server. When the called party receives the Offer, it calls the setRemoteDescription method of its RTCPeerConnection object to save the remote Offer.

After that, the called party creates the SDP content of type Answer and calls the setLocalDescription method of RTCPeerConnection object to store it locally.

Again, it sends the Answer to the server. After receiving the message, the server directly forwards the message to the calling party without processing it. After receiving the Answer, the caller calls setRemoteDescription to save it.

Through the above steps, the entire media negotiation part is complete.

Let’s look at how this logic is implemented on iOS:

. [peerConnection offerForConstraints:[self defaultPeerConnContraints] completionHandler:^(RTCSessionDescription * _Nullable sdp, NSError * _Nullable error) {if(error){
                          NSLog(@"Failed to create offer SDP, err=%@", error);
                      } else {
                          __weak RTCPeerConnection* weakPeerConnction = self->peerConnection;
                          [self setLocalOffer: weakPeerConnction withSdp: sdp]; }}]; .Copy the code

Create an Offer SDP on the iOS side using the offerForConstraints method of the RTCPeerConnection object. It takes two arguments:

  • One is a parameter to the type RTCMediaConstraints, which we introduced earlier when we created the RTCPeerConnection object and won’t repeat here.
  • The other argument is an anonymous callback function. You can determine whether the offerForConstraints method executed successfully by setting error to null. If the execution succeeds, the parameter SDP is the content of the created SDP.

If SDP is successfully obtained, we first need to save it locally according to the previous process description; And then it sends it to the other server, where it gets forwarded to the other end.

Our code follows exactly this process. The setLocalOffer method in the code above does just that. The specific code is as follows:

. [pcsetLocalDescription:sdp completionHandler:^(NSError * _Nullable error) {
        if(! error) { NSLog(@"Successed to set local offer sdp!");
        }else{
            NSLog(@"Failed to set local offer sdp, err=%@", error); }}]; __weak NSString* weakMyRoom = myRoom; dispatch_async(dispatch_get_main_queue(), ^{ NSDictionary* dict = [[NSDictionary alloc] initWithObjects:@[@"offer", sdp.sdp]
                                                       forKeys: @[@"type"The @"sdp"]]. [[SignalClient getInstance] sendMessage: weakMyRoom withMsg: dict]; }); .Copy the code

As you can clearly see from the code above, it does two things. First, call setLocalDescription to save the SDP locally. The other thing is to send messages;

So, just from the description above you can see all the logic behind it. We’re not going to go through them all.

After the negotiation, audio and video data will be transmitted at the bottom of WebRTC. If the remote video data arrives locally, we need to display it on the interface. How does this work?

Render remote video

If you remember, when we created the RTCPeerConnection object, we also set a delegate to RTCPeerConnection, which in our case was the CallViewController object. In this object we implement proxy methods for all RTCPeerConnection objects. The key ones are as follows:

  • (void)peerConnection:(RTCPeerConnection *)peerConnection didGenerateIceCandidate:(RTCIceCandidate *)candidate; This method is used to collect available candidates.

  • (void) peerConnection: (RTCPeerConnection *) peerConnection didChangeIceConnectionState (RTCIceConnectionState) newState because; This method is triggered when the ICE connection status changes

  • (void)peerConnection:(RTCPeerConnection *)peerConnection didAddReceiver:(RTCRtpReceiver *)rtpReceiver Streams: (NSArray mediaStreams < > RTCMediaStream * *); This method fires when listening for a remote track.

So, when do you start rendering remote video? When the remote video stream comes in, Void peerConnection:(RTCPeerConnection *)peerConnection didAddReceiver:(RTCRtpReceiver *)rtpReceiver Streams :(NSArray<RTCMediaStream *> *)mediaStreams method So we just need to write some logic in that method.

When the above function is called, we can get track with the rtpReceiver argument. This track may be an audio trak or a video trak. Therefore, we should first make a judgment on track to see whether it is video or audio.

If it’s a video, add remoteVideoView to the trak, adding an observer to Track so that remoteVideoView can get the video data from Track. The render method is implemented in remoteVideoView and will render directly as soon as a quantity of data is received. Finally, we can see the video on the far side.

The specific code is as follows:

. RTCMediaStreamTrack* track = rtpReceiver.track;if([track.kind isEqualToString:kRTCMediaStreamTrackKindVideo]){
   
    if(! self.remoteVideoView){ NSLog(@"error:remoteVideoView have not been created!");
        return; } remoteVideoTrack = (RTCVideoTrack*)track; [remoteVideoTrack addRenderer: self.remoteVideoView]; }...Copy the code

With the above code, we can display the video from the remote end.

summary

Above I will achieve the iOS 1 to 1 real-time call overall logic explained. Overall, the process is basically the same as on the JS /Android side.

In this article, I fully explain how to implement a real-time audio and video call program on iOS through the introduction of the following topics:

  • Permission to apply for
  • Introducing the WebRTC library
  • Capture and display local video
  • Signaling drive
  • Create audio and video data channels
  • Media negotiation
  • Render remote video

For a developer familiar with iOS, this article should be able to quickly write such a real-time call program.

Thank you very much!


Related: WebRTC introductory tutorial (3) | how to use the Android end WebRTC WebRTC introductory tutorial (2) | WebRTC signaling control and STUN/TURN server build WebRTC introductory tutorial (a) | build WebRTC signaling server