RTSP is an Internet protocol specification, which is an application layer protocol level network communication system in TCP/IP protocol system. Designed for use in entertainment (such as audio and video) and communication systems to control streaming media servers. This protocol is used to establish and control media sessions between endpoints. The media server client issues VHS-style commands such as PLAY, PAUSE, SETUP, DESCRIBE, RECORD, and so on. To facilitate real-time control of media streams from server to client or from client to server.

RTSP transmission process

  • When a user or application tries to stream video from a remote source, the client device sends an RTSP request to the server to determine the available options, such as PLAY, PAUSE, SETUP…
  • The server then returns a list of the types of requests it can accept through the RTSP.
  • Once the client knows how to make the request, it sends the media description request to the streaming server.
  • The server responds with a media description.
  • The client sends a setup request from there, and the server responds with information about the transport mechanism.
  • Once the setup process is complete, the client initiates the stream transfer process by telling the server to send a bitstream (binary sequence) using the transport mechanism specified in the setup request.

Client -> Server :DESCRIBE

Server -> Client: 200 OK (SDP)

Client -> Server :SETUP

Server -> Client: 200 OK

Client -> Server :PAUSE

.

Protocol analysis and learning is indispensable for packet capture, screenshot of RTSP protocol packet capture diagram:

Why is the RTS protocol important

  • RTSP was originally a way to allow users to play audio and video directly from the Internet without having to download media files to their devices. The protocol has been used for a variety of purposes, including Internet camera sites, online education and Internet broadcasting.
  • RTSP uses the same concepts as basic HTTP, largely to be compatible with existing Web infrastructures. Because of this, most of HTTP’s extension mechanisms can be imported directly into the RTSP.
  • The RTSP protocol also has great flexibility. Clients can request the features they want to use to find out if the media server supports them. Similarly, anyone who owns media can deliver media streams from multiple servers. The agreement also aims to adapt to the future development of media so that media creators can modify the agreement if necessary.

RTSP protocol directive

Although RTSP is similar to HTTP in some respects, it defines a sequence of controls that can be used to control multimedia playback. Although HTTP is stateless, RTSP is stateful.

Use identifiers when you need to track concurrent sessions. Like HTTP, RTSP uses TCP to maintain an end-to-end connection on port 554.

Although most RTSP control messages are sent from the client to the server, some commands are delivered in the other direction, from the server to the client.

Let’s take a look at the basic RTSP request:

SETUP

The SETUP request specifies how a single media stream must be transmitted. This must be done before sending the PLAY request.

The request contains the media stream URL and transport specifier.

This specifier typically includes a local port for receiving RTP data (audio or video) and another for RTCP data (meta information).

The server reply typically confirms the selected parameters and fills in the missing parts, such as the selected port of the server. Each media stream must be configured using SETUP before an aggregate playback request can be sent.

PLAY

The PLAY request will cause one or all of the media streams to PLAY. PLAY requests can be stacked by sending multiple PLAY requests. The URL can be an aggregation URL (to play all media streams) or a single media stream URL (to play only the stream).

Ranges can be specified. If no range is specified, the playback starts at the beginning and ends, or resumes at the pause point if the stream has been paused.

PAUSE

A PAUSE request temporarily suspends one or all of the media streams, so they can be resumed later with a PLAY request. The request contains an aggregate or media stream URL.

The range parameter on the PAUSE request specifies when to PAUSE. If the range argument is omitted, the pause occurs immediately and indefinitely.

RECORD

This method starts recording a series of media data following the demonstration instructions. The timestamp reflects the start time and end time (UTC). If no time range is given, use the start or end time provided in the demo.

If a session has started, start recording immediately. The server decides whether to store the recorded data under the request URl or other URI.

If the server does not use the request URI, the response should be 201 and contain entity and location headers that describe the status of the request and reference the new resource.

ANNOUNCE

When sent from the client to the server, ANNOUNCE publishes to the server a presentation or a description of the media object requesting the URL identity. ANNOUNCE updates the session description in real time.

If a new media stream is added to a presentation (for example, in a live presentation), the entire presentation note should be sent again, not just the other components, so that they can be removed.

TEARDOWN

The TEARDOWN request is used to terminate the session. It stops all media streams and frees all session-related data on the server.

GET_PARAMETER

GET_PARAMETER requests to retrieve the parameter value of the representation or stream specified in the URI. The content of the reply and response is left to the implementation.

SET_PARAMETER

This method requires setting parameter values for the representation or stream specified by the URI.

The Wireshark RTSP protocol is parsed

With a general understanding of the use of the RTSP protocol, let’s parse and implement the RTSP protocol.

#include <sys/stat.h>
#include <sys/types.h>
#include <netinet/tcp.h>
#include <netinet/udp.h>
#include <netinet/ip.h>
#include <netinet/ip6.h>
#include <net/ethernet.h>
#include <pcap.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

/*RTSP 端口*/
#define RTSP_TCP_PORT_RANGE      554 

typedef enum {
    RTSP_REQUEST,
    RTSP_REPLY,
    RTSP_NOT_FIRST_LINE
} rtsp_type_t;

static const char *rtsp_methods[] = {
    "DESCRIBE",
    "ANNOUNCE",
    "GET_PARAMETER",
    "OPTIONS",
    "PAUSE",
    "PLAY",
    "RECORD",
    "REDIRECT",
    "SETUP",
    "SET_PARAMETER",
    "TEARDOWN"
};

/* 用于RTSP统计  */
struct rtsp_info_value_t {
  char  *request_method;
  unsigned long int  response_code;
};

/*
  假定一个字节数组(假定包含一个以空值结尾的字符串)作为参数,
  并返回字符串的长度-即该数组的大小,对于空终止符的值减去1。 
 */
#define STRLEN_CONST(str)   (sizeof (str) - 1)

   
static const char rtsp_content_type[]      = "Content-Type:";
static const char rtsp_transport[]         = "Transport:";
static const char rtsp_sps_server_port[]   = "server_port=";
static const char rtsp_cps_server_port[]   = "client_port=";
static const char rtsp_sps_dest_addr[]     = "dest_addr=";
static const char rtsp_cps_src_addr[]      = "src_addr=";
static const char rtsp_rtp_udp_default[]   = "rtp/avp";
static const char rtsp_rtp_udp[]           = "rtp/avp/udp";
static const char rtsp_rtp_tcp[]           = "rtp/avp/tcp";
static const char rtsp_rdt_feature_level[] = "RDTFeatureLevel";
static const char rtsp_real_rdt[]          = "x-real-rdt/";
static const char rtsp_real_tng[]          = "x-pn-tng/"; /* synonym for x-real-rdt */
static const char rtsp_inter[]             = "interleaved=";
static const char rtsp_content_length[] = "Content-Length:";



static void rtsp_create_conversation(u_char *line_begin, size_t line_len,rtsp_type_t rtsp_type_packet)
{
    char    buf[256];
    char   *tmp;
    bool  rtp_udp_transport = false;
    bool  rtp_tcp_transport = false;
    bool  rdt_transport = false;
    //bool  is_video      = false; /* 是否需要显示视频  */
  unsigned int     c_data_port, c_mon_port;
    unsigned int     s_data_port, s_mon_port;
  unsigned int     ipv4_1, ipv4_2, ipv4_3, ipv4_4;

    if (rtsp_type_packet != RTSP_REPLY) {
        return;
    }


    /* 将行复制到buf */
    if (line_len > sizeof(buf) - 1)
    {
        /* 避免溢出缓冲区。 */
        line_len = sizeof(buf) - 1;
    }

    memcpy(buf, line_begin, line_len);
    buf[line_len] = '\0';
  printf("%s\n",buf);
    /* Get past "Transport:" and spaces */
    tmp = buf + STRLEN_CONST(rtsp_transport);
  //printf("tmp %s\n",tmp);
    while (*tmp && isspace(*tmp))
        tmp++;

  if ((tmp = strstr(buf, rtsp_cps_src_addr))) 
  {

        tmp += strlen(rtsp_cps_src_addr);
    //printf("tmp ======  %s\n",tmp);
        if (sscanf(tmp, "\"%u.%u.%u.%u:%u\"", &ipv4_1, &ipv4_2, &ipv4_3, &ipv4_4, &c_data_port) == 5) 
    {
            char *tmp2;
            char *tmp3;
      //printf("ipv4_1 %d\n",ipv4_1);
      //printf("ipv4_2 %d\n",ipv4_2);
      //printf("ipv4_3 %d\n",ipv4_3);
      //printf("ipv4_4 %d\n",ipv4_4);
      printf("c_data_port %d\n",c_data_port);
            //Skip leading  
            tmp++;
            tmp2=strstr(tmp,":");
            tmp3=strndup(tmp,tmp2-tmp);      
      printf("src_addr  %s\n",tmp3);
                  
            free(tmp3);
        }
    }
  if ((tmp = strstr(buf, rtsp_sps_dest_addr))) 
  {
        tmp += strlen(rtsp_sps_dest_addr);
        if (sscanf(tmp, "\":%u\"", &s_data_port) == 1) 
    {
            /* :9 mean ignore */
            if (s_data_port == 9) {
                s_data_port = 0;
            }
      printf("s_data_port %d\n",s_data_port);
        }
  }
      
    if ((tmp = strstr(buf, rtsp_sps_server_port))) {
        tmp += strlen(rtsp_sps_server_port);
        if (sscanf(tmp, "%u", &s_mon_port) == 1) {
            
            printf("s_mon_port %d\n",s_mon_port);
        }
    }  
  
}


static bool is_rtsp_request_or_reply( unsigned char *line, int offset, rtsp_type_t *type)
{
    unsigned int   ii = 0;
  char *data = reinterpret_cast<char *>(line);
    int           tokenlen;
    char         response_chars[4];
  struct rtsp_info_value_t rtsp_info;
  char *token, *next_token;
    /*这是RTSP的回复 ?  */
    if ( strncasecmp("RTSP/", data, 5) == 0) {
        /*
         * Yes.
         */
        *type = RTSP_REPLY;
    
    /* 第一个标记是版本。  */
    offset += 9;
    
    memcpy(response_chars, data + offset, 3);
    response_chars[3] = '\0';
    rtsp_info.response_code = strtoul(response_chars, NULL, 10);
    //printf("rtsp_info.response_code %d\n",rtsp_info.response_code);
    
        return true;
    }

    /*
    这是RTSP请求吗?
    检查该行是否以RTSP请求方法之一开头。 
     */
    for (ii = 0; ii < sizeof rtsp_methods / sizeof rtsp_methods[0]; ii++) {
        size_t len = strlen(rtsp_methods[ii]);
        if (strncasecmp(rtsp_methods[ii], data, len) == 0 &&(isspace(data[len])))
        {
            *type = RTSP_REQUEST;
            rtsp_info.request_method = strndupa(rtsp_methods[ii], len+1);
      //printf("request_method: %s\n",rtsp_info.request_method);

            return true;
        }
    }

    /* 既不是请求也不是回应 */
    *type = RTSP_NOT_FIRST_LINE;
    return false;
}


/* 阅读回复消息的第一行  */
static void process_rtsp_reply(u_char *rtsp_data, int offset,rtsp_type_t rtsp_type_packet)
{
  char *lineend  = reinterpret_cast<char *>(rtsp_data + offset);
    char *status   = reinterpret_cast<char *>(rtsp_data );
    char *status_start;
    unsigned int         status_i;


    /* status code */

    /* Skip protocol/version */
    while (status < lineend && !isspace(*status))
        status++;
    /* Skip spaces */
    while (status < lineend && isspace(*status))
        status++;

    /* Actual code number now */
    status_start = status;
  //printf("status_start %s\n",status_start);
    status_i = 0;
    while (status < lineend && isdigit(*status))
        status_i = status_i * 10 + *status++ - '0';
  
  //printf("status_i %d\n",status_i);
  
  offset += strlen(lineend);
  rtsp_create_conversation(rtsp_data,offset,rtsp_type_packet);

}

static void process_rtsp_request(u_char *rtsp_data, int offset,rtsp_type_t rtsp_type_packet)
{
  char *lineend  = reinterpret_cast<char *>(rtsp_data + offset);
   // u_char *lineend  = rtsp_data + offset;
    unsigned int      ii = 0;
    char *url;
    char *url_start;
  char    buf[256];
  char   *tmp;
  int content_length = 0;
  char content_type[256];
    /* Request Methods */
    for (ii = 0; ii < sizeof rtsp_methods / sizeof rtsp_methods[0]; ii++) {
        size_t len = strlen(rtsp_methods[ii]);
        if (strncasecmp(rtsp_methods[ii], lineend, len) == 0 &&(isspace(lineend[len])))
            break;
    }


  //printf("process_rtsp_request 0x%.2X,0x%.2X,0x%.2X,0x%.2X\n",lineend[0],lineend[1],lineend[2],lineend[3]);  
    /* URL */
    url = lineend;

    /* Skip method name again */
    while (url < lineend && !isspace(*url))
        url++;
    /* Skip spaces */
    while (url < lineend && isspace(*url))
        url++;
    /* URL starts here */
    url_start = url;
  
    /* Scan to end of URL */
    while (url < lineend && !isspace(*url))
        url++;
  
  printf("%s\n",url_start);
  printf("111url %s\n",url);
  
  if ((tmp = strstr(url_start, rtsp_content_type))) 
  {
        tmp += strlen(rtsp_content_type);
        if (sscanf(tmp, "%s", content_type) == 1) 
    {
            
            //printf("content_type %s\n",content_type);
        }
    }  
  
  //Content-Length
  if ((tmp = strstr(url_start, rtsp_content_length))) 
  {
        tmp += strlen(rtsp_content_length);
        if (sscanf(tmp, "%u", &content_length) == 1) 
    {
            
            //printf("content_length %d\n",content_length);
        }
    }  
  
}



void dissect_rtsp(u_char *rtsp_data)
{
  int offset = 0;
  rtsp_type_t   rtsp_type_packet;
  bool      is_request_or_reply;
    u_char *linep, *lineend;
  u_char    c;
  //bool      is_header = false;
  is_request_or_reply = is_rtsp_request_or_reply(rtsp_data, offset, &rtsp_type_packet);
  
    if (is_request_or_reply)
    goto is_rtsp;


  

is_rtsp:

  
  switch(rtsp_type_packet)
  {
    case RTSP_REQUEST:

      process_rtsp_request(rtsp_data, offset,rtsp_type_packet);

      break;

    case RTSP_REPLY:

      process_rtsp_reply(rtsp_data, offset,rtsp_type_packet);
      
      break;

    case RTSP_NOT_FIRST_LINE:
      /* Drop through, it may well be a header line */
      break;
    default:
      break;
  }
  

}

static void dissect_rtsp_tcp(struct ip *pIp)
{
  int iHeadLen = pIp->ip_hl*4;
  int iPacketLen = ntohs(pIp->ip_len) - iHeadLen; 
  int offset = 0;
  int nFragSeq = 0;
  struct tcphdr *pTcpHdr = (struct tcphdr *)(((char  *)pIp) + iHeadLen);
  
  if (pIp->ip_p == IPPROTO_TCP && (ntohs(pTcpHdr->dest) == RTSP_TCP_PORT_RANGE) 
  || (ntohs(pTcpHdr->source) == RTSP_TCP_PORT_RANGE) )/*仅处理TCP协议*/
  {  
    
      
    int iPayloadLen = iPacketLen - pTcpHdr->doff*4;
    //printf("TCP Payload Len %d\n",iPayloadLen);    
    u_char *RtspHdr = (u_char*)(pTcpHdr+1);
    if (RtspHdr == NULL)
      return;
    u_char *RtspData = RtspHdr + 12; /*skip OPtions */    
    //printf("NtpHdr 0x%.2X,0x%.2X,0x%.2X,0x%.2X\n",RtspData[0],RtspData[1],RtspData[2],RtspData[3]);    
    dissect_rtsp(RtspData);
  }  
}



Copy the code

Compile operation

RTSP is a text-based protocol that uses carriage return newline (\r\n) as the end character of each line. The advantage of RTSP is that custom parameters can be easily added during use and packet capture analysis can be facilitated.

RTSP packets are classified into request packets and response packets in terms of message transmission direction. The request packet is sent from the client to the server, and the response packet is sent from the server to the client.

conclusion

RTSP provides controls for streaming media such as PLAY, PAUSE, and SETUP, but does not transmit data itself. RTSP functions as remote control for streaming media servers.

The server can choose to use TCP or UDP to stream content, which has a similar syntax and operation to HTTP. For more information, please refer to the official RFC documentation, which is the most authoritative document.