Github can be downloaded to view the source code: source link

1. Instant messaging, the milestone of the age

A few days ago, I watched the “Entrepreneurial era”, the plot of the TV drama is probably like this :IT engineer Guo Xinnian and friends Luo Wei and investment banking elite that blue and others together, set foot on the road of Internet entrepreneurship. The startup developed an IM product called “Magic Crystal”. Guo Xin years after the first failure, divorced, still owe a lot of foreign debt, on a bicycle after Tibet a life and death for just the inspiration, wanted to create a IM product “crystal”, “magic” is designed to increase the interpersonal relationship, although the story is purely fictitious, but it is to let a person think QQ had hoped is that?

One thing is certain, instant messaging is truly a milestone. Tencent cannot be strong without two products :QQ and wechat, which have different design ideas. QQ relies on IM system and is designed to create personal space and national entertainment. We often see that QQ is favored by middle and high school students, and QQ accounts are often bound with music and games. Wechat from QQ after the flow, the main business, from the beginning of the launch of wechat pay and Alipay competition, in the field of commercial payment has a place (wechat pay is mainly used by users for small payment scenarios, Alipay is mainly used in the enterprise large transfer, personal financial management) after. Wechat has launched a public number, small program, it is clear that in the commercial field has occupied the upper hand of Alipay, become the overlord of the commercial APP, and then there was chat treasure, flash and toilet three main faction siege micro farce, the results we may all know……

Even so, can not underestimate the value of Alipay in the business field. Different from the original intention of wechat product design, Alipay prefers to expand functions to integrate and achieve by itself, which is relatively delicate. In terms of payment security, Alipay is much better than wechat, and the whole application is smoother than wechat. Alipay also has its own small program, but it usually cooperates with relevant enterprises or opens interfaces to create applications, such as life payment, Ele. me takeout, Didi Chuxing and so on. And wechat will be more application creation permissions open to developers, by enterprise developers to create small programs, maintain the public number, so as to achieve their own business value, the facts have proved that wechat is very successful!

Ali on IM system rashly office areas, to build a “nail”, is a more exquisite products, including clock in attendance, leave approval, meeting management do very well, unlike WeChat, enterprise by nailing exchange of information, the other is to see whether information “read” (office is, after all, the function is very be necessary). Not to be outdone, Tencent created “Enterprise wechat” and began to compete head-on with “Dingding”. Although it lagged behind Dingding in market share, its users grew rapidly.

Dingding was officially launched in January 2015, and Tencent officially released version 1.0 of wechat for enterprises in April 2016, which only has simple functions such as checking attendance, asking for leave and reimbursement, but is slightly dull in terms of product functions. At that time, We looked at Dingding. By virtue of the first-mover advantage, the product line determined at the early stage “please” the boss. In 2016, there were 1 million enterprises, and the number rose to 7 million in 2018. Enterprise wechat early indecisive play, but also let it play in the enterprise OA office nails. However, after the release of version 3.0, the situation began to reverse, the number of nails in the user seems to be saturated, it is difficult to have a new breakthrough, and the enterprise wechat really began to gradually occupy the market.

Other im-based businesses include Momo and Tantan, which are more focused on making friends and relationships than wechat. (I wonder if this is the reason everyone has an iPhone at the end of every year, just kidding)

The author attended a Gopher conference this year and had the honor to listen to tantan architects share their microservialization process this year. The IM system built quickly in this paper is also implemented by Go language. Here I would like to share the architecture diagram of Tantan APP with you first:

The above talked about some IM system product design, below we return to the theme, about the chapter content arrangement of this article.

2. Chapter Overview

The purpose of this article is to help readers understand socket protocol in depth, and quickly build a high availability, scalable IM system (the article title is purely eye-catching, not true, please do not care.) At the same time to help readers understand the IM system can be further optimized and improved. Small as it is, the IM system contains basic functions such as registration, login and adding friends, as well as single chat and group chat, and supports sending text, facial expressions and pictures. On the system, readers can easily expand voice, video chat and send red packets. In order to help readers understand the principle of IM system more clearly, I will give an in-depth explanation of websocket protocol in Section 3. Websocket is a commonly used protocol in long links. Then section 4 will explain the skills and main code implementation of fast IM system; In section 5, the author will put forward some suggestions and ideas for IM system architecture upgrade and optimization. The last chapter is a review of this paper.

3. Thoroughly understand the WebSocket protocol

The goal of Web Sockets is to provide full-duplex, bidirectional communication over a single persistent connection. After Javascript creates the Web Socket, an HTTP request is sent to the browser to initiate the connection. After receiving a response from the server, the connection established will upgrade HTTP from HTTP to WebSocket. Since WebSocket uses a custom protocol, the URL schema is also slightly different. The unencrypted connection is no longer http://, but ws://; The encrypted connection is also not https://, but WSS ://. You must carry this schema with you when using WebSocket urls, because other schemas may be supported in the future. The advantage of using a custom protocol rather than HTTP is the ability to send very small amounts of data between the client and server without having to worry about the byte-level overhead of HTTP. WebSocket is ideal for mobile applications because of the small size of the packets being delivered. In the following sections, the detailed implementation of Web Sockets will be explored in depth. The next four sections of this paper will not involve a large number of code fragments, but will analyze the relevant API and technical principle. I believe you can read this description after reading the following. There’s a sense of clarity.

3.1 WebSocket uses HTTP handshake channels

Handshake channel is a communication channel established by the HTTP client and server through the TCP three-way handshake. Each interaction between a client and a server using THE HTTP protocol requires the establishment of such a “channel” through which communication can then be carried out. The familiar Ajax interaction takes place over such a channel, except that the Ajax interaction is short and the “channel” connection is broken after a request-> Response. The following is a schematic diagram of the process of establishing a “handshake channel” in HTTP:

As mentioned above, after Javascript creates a WebSocket, an HTTP request is sent to the browser to initiate a connection, and the server responds. This is the “handshake” process. During this handshake, the client and server do two main things:

  • A connection “handshake channel” is established for communication (this is the same as HTTP, except that HTTP releases the handshake channel once the data exchange is complete. This is called a “short connection”, and its lifetime is the time of a data exchange, usually in milliseconds.)
  • Upgrade HTTP protocol to WebSocket protocol, and reuse HTTP handshake channel, so as to establish a persistent connection. Why doesn’t HTTP reuse its own “handshake channel” instead of re-establishing the “handshake channel” through the TCP three-way handshake every time data is exchanged? Long answer is like this: although “connected” on the client and server interaction in the process of eliminating the build “handshake channel” trouble every time step, but maintain such a “long connection” is the need to consume the server resources, and in most cases, this is unnecessary, resource depletion can say HTTP standards after deliberate consideration. By the time we get to WebSocket protocol data frames, you’ll probably understand that there is too much work to do to maintain a “long connection” between the server and client.

With handshake channels out of the way, let’s look at how HTTP was upgraded to WebSocket.

3.2 Upgrading HTTP to WebSocket

The update protocol requires the client to communicate with the server. How does the server know to upgrade the HTTP protocol to WebSocket? It must have picked up some kind of signal from the client. Here is my “client initiated protocol upgrade request message” from Google Chrome. By analyzing this message, we can get more details about the protocol upgrade in WebSocket.

First, the client initiates a protocol upgrade request. The standard HTTP packet format is used, and only the GET method is supported. Here’s what the header of the key request means:

  • Connection: Upgrade: indicates the protocol to be upgraded
  • Upgrade: websocket: indicates that the webSocket protocol is to be upgraded
  • Sec-websocket-version: 13: indicates the WebSocket Version
  • The Sec – WebSocket – Key: UdTUf90CC561cQXn4n5XRg = = : with the Response headers in Response to the first Sec – WebSocket – Accept: GZk41FJZSYY0CmsrZPGpUGRQzkY = is necessary, to provide basic protection, such as malicious link or unintentional. Connection refers to the signal sent by the client to the server. The server will upgrade the HTTP protocol after receiving the signal. How does the server verify that the request sent by the client is valid? Each time the client initiates a protocol upgrade request, a unique code is generated: sec-websocket-key. The server takes the code, validates it through an algorithm, and sends the sec-websocket-Accept response to the client, which validates the sec-websocket-Accept response. The algorithm is simple:
  1. Concatenate sec-websocket-key with globally unique (GUID, [RFC4122]) identifier: 258eAFa5-E914-47DA-95CA-C5AB0DC85b11
  2. The digest is computed by SHA1 and converted to a Base64 string

The string 258eafa5-e914-47DA-95CA-c5AB0dc85b11 is also called “magic string”. It is not necessary to care why it is used in the Websocket handshake calculation, except that it is specified in the RFC standard. The official parse simply states that this value is unlikely to be used by network terminals that do not understand the WebSocket protocol. Let’s describe this algorithm in the best language in the world.

public function dohandshake($sock.$data.$key) {
        if (preg_match("/Sec-WebSocket-Key: (.*)\r\n/".$data.$match)) {
            $response = base64_encode(sha1($match[1].'258EAFA5-E914-47DA-95CA-C5AB0DC85B11'.true));
            $upgrade  = "HTTP / 1.1 101 Switching Protocol \ r \ n" .
                "Upgrade: websocket\r\n" .
                "Connection: Upgrade\r\n" .
                "Sec-WebSocket-Accept: " . $response . "\r\n\r\n";
            socket_write($sock.$upgrade, strlen($upgrade));
            $this->isHand[$key] = true; }}Copy the code

The HTTP1.1 protocol is split with a newline character (\r\n). We can parse sec-websocket-accept values using regular matching. This is the same as using curl to simulation get requests. The sec-websocket-accept returned by the server can be calculated using the sec-websocket-key and handshake algorithms shown in the figure above:

As you can see from the figure, the base64 string calculated by the algorithm is the same as the sec-websocket-Accept string. So what if the server returns an incorrect SEC-websocket-accept string during the handshake? For example, change the global unique identifier (258eafa5-e914-47DA-95ca-C5ab0dc85b11) to 258eafa5-e914-47DA-95ca-C5ab0dc85B12.

3.3 Frame and data Fragment transmission of WebSocket

Below is a test I did: copy the content of the first chapter of the novel “Gone with the Wind” into text data and send it to the server through the client. Then the server completes a communication with the same message.

It can be seen that a piece of nearly 15,000 bytes of data is communicated between the client and server in only 150ms. We can also see the frame bar in the browser console showing the text data sent by the client and the response from the server. You will be surprised by the powerful data transmission capability of WebSocket communication. Is it really the case in frame that the client sends a chunk of text data directly to the server, and the server receives the data and then sends a chunk of text data back to the client? Of course, this is impossible. We all know that HTTP is implemented based on TCP, and data sent through HTTP is also subcontracted and forwarded. That is, big data is divided into small pieces according to the packet form and sent to the server. As for HTTP subcontracting strategy, you can check relevant materials for research. Websocket protocol also forwards data through fragmented packaging, but the strategy is different from HTTP subcontracting. Frame is the basic unit of data that websocket sends. Here is its message format:

The packet content specifies the data identifier, operation code, mask, data, and data length. It doesn’t matter if you don’t understand it very well. Now I will explain to you that you only need to understand the function of important flags in packets. First we understand that the client and server for Websocket messaging looks like this:

  • Client: The message is cut into multiple frames and sent to the server.
  • Server: Receives message frames and reassembles the associated frames into complete messages.

When the server receives a frame message from the client, it assembles the frames. How does it know when the data assembly is complete? This is the information stored in the upper left corner of the packet by the FIN(one bit). 1 indicates that the packet was the last fragment of the message. 0 indicates that the packet was not the last fragment. In Websocket communication, the data fragments sent by the client are orderly, which is different from HTTP. HTTP sends the message to the server in a concurrent disorder after it is subdivided. The location of the packet information in the data is stored in HTTP packets. Websocket, on the other hand, only needs one FIN bit to ensure complete data delivery to the server. RSV1,RSV2,RSV3 These three flag bits are reserved for client and server developers to negotiate and expand during development. The default is 0. How extensions are used must be negotiated during the handshake phase, which itself is a client-server negotiation.

3.4 Websocket Connection retention and Heartbeat Detection

Websocket is a long connection. To maintain real-time bidirectional communication between the client and server, ensure that the TCP channel between the client and server is still connected. However, if a connection is maintained without data for a long time, it may waste server resources. However, there are some scenarios in which the client and the server still need to keep the connection although there is no data communication for a long time. For example, if you have not talked with a QQ friend for several months, one day he sends a QQ message to tell you that he is going to get married, you can still receive it in the first time. That’s because the client and server are always using the heartbeat to check the connection. Detecting heartbeat connections between clients and servers is like playing ping-pong:

  • Sender -> Receiver: ping
  • Recipient -> Sender: Pong

When there is no ping or pong, there must be a problem with the connection. With that said, I’m going to use Go to implement a heartbeat detection language. Websocket communication is a tedious task, so it’s a good choice to use the open source library directly. I use gorilla/ Websocket. The library already encapsulates the implementation details of WebSocket (handshake, data decoding). I’ll paste the code directly below:

package main

import (
    "net/http"
    "time"

    "github.com/gorilla/websocket") var (// Complete the upgrade = websocket.upgrader {// Allow cross-domain (websockets are always independently deployed) CheckOrigin:func(r * http.request) bool {return true}, } ) func wsHandler(w http.ResponseWriter, R *http.Request) {var (conn *websocket. conn err Error data []byte) // The server responds to the HTTP Request (upgraded to websocket) from the client. When the protocol is upgraded to WebSocket, the TCP three-way handshake for HTTP connection establishment will remain.ifconn, err = upgrade.Upgrade(w, r, nil); err ! = nil {return} // Start a coroutine and send a heartbeat message go to the client every 1sfunc() {
        var (
            err error
        )
        for {
            if err = conn.WriteMessage(websocket.TextMessage, []byte("heartbeat")); err ! = nil {return} time.sleep (1 * time.second)}}(for{// Data read through websocket can be text data or Binary dataif_, data, err = conn.ReadMessage(); err ! = nil { goto ERR }iferr = conn.WriteMessage(websocket.TextMessage, data); err ! = nil {goto ERR}} ERR: // After an error, Close the socket connection conn.close ()} funcmain() {
    http.HandleFunc("/ws", wsHandler)
    http.ListenAndServe("0.0.0.0:7777", nil)
}
Copy the code

Taking advantage of the ease of constructing coroutines in go, I specifically turned on a coroutine to send one message per second to the client. Open the browser of the client and you can see that the heartbeat data in the frame is beating all the time. When the long link is disconnected, the heartbeat data is not there, just like the human has no heartbeat:

We have a good understanding of websocket protocol, let’s quickly build a high-performance, scalable IM system.

4. Quickly build a high-performance and scalable IM system

4.1 System Architecture and code file directory structure

The following diagram shows a fairly complete IM system architecture: the C-side, the access layer (through protocol access), the S-side processing logic and distributing messages, and the storage layer for persistent data.

In this section, Webapp is used at the C end, and the function is quickly realized by rendering Vue template in Go language. The access layer uses the WebSocket protocol, which has been introduced in depth. S terminal is the focus of our implementation, in which the authentication, login, relationship management, single chat and group chat functions have been implemented, readers can expand other functions on the basis of these functions, such as video voice chat, red envelope, circle of friends and other business modules; Storage layer we do relatively simple, just use Mysql simple persistence store user relationship, and then chat image resources we store in the local file. Although the implementation of our IM system is relatively simplified, readers can improve, improve and expand on the basis of the second, and still be able to make highly available enterprise-level products.

Our system service is constructed with Go language, with relatively simple code structure and excellent performance (which is unmatched by Java and other languages). It supports online chat of tens of thousands of people.

Here is the directory structure of the code file:

│ ├─ ├─ ├.go │ ├─ ├─ ├.go │ ├─ ├.go │ ├─ API entry │ │ ├ ─ ─ chat. Go │ │ ├ ─ ─ contract. Go │ │ ├ ─ ─ the upload. Go │ │ └ ─ ─ user. Go │ ├ ─ ─ main. Go / / program entrance │ ├ ─ ─ the model definition and storage │ / / data │ ├ ─ ─ community. Go │ │ ├ ─ ─ contract. Go │ │ ├ ─ ─ init. Go │ │ └ ─ ─ user. Go │ ├ ─ ─ service / / logic implementation │ │ ├ ─ ─ contract. Go │ │ └ ─ ─ User. Go │ ├ ─ ─ util / / help function │ │ ├ ─ ─ the md5. Go │ │ ├ ─ ─ the parse. Go │ │ ├ ─ ─ resp. Go │ │ └ ─ ─ string. Go │ └ ─ ─ the view / / template resources │ │ ├ ─ ─ . Asset //js, CSS file resource // Upload resources, upload images will be placed hereCopy the code

Starting with the entry function main.go, we define the Controller layer, which is the entry to the client API. Services are used to handle major user logic, including message distribution and user management. The model layer defines some data tables, mainly user registration and user friend relationship, groups and other information, stored in mysql. Util contains helper functions such as encryption, request response, and so on. “View” stores template resources, which are stored in the app folder. The outer layer of “Asset” stores CSS, JS files and emoticons used in chat, etc. Resource stores files such as pictures or videos in user chats. Overall, our code directory structure is relatively clean and concise.

Now that we know the IM system architecture we are going to build, let’s take a look at the key features of the architecture.

4.2 10 lines of code universal template rendering

The Go language provides powerful HTML rendering capabilities and is very simple to build web applications. Here is the code that implements template rendering, which is so simple that it can be implemented directly in the main. Go function:

func registerView() {
	tpl, err := template.ParseGlob("./app/view/**/*")
	iferr ! = nil { log.Fatal(err.Error()) }for _, v := range tpl.Templates() {
		tplName := v.Name()
		http.HandleFunc(tplName, func(writer http.ResponseWriter, request *http.Request) {
			tpl.ExecuteTemplate(writer, tplName, nil)
		})
	}
}
...
func main() {... http.Handle("/asset/", http.FileServer(http.Dir(".")))
	http.Handle("/resource/", http.FileServer(http.Dir(".")))
	registerView()
	log.Fatal(http.ListenAndServe(": 8081", nil))
}
Copy the code

Go is also easy to implement static resource servers, just call http.fileserver, so that HTML files can easily access dependent JS, CSS, and icon files. Using the HTTP /template package ParseGlob, ExecuteTemplate makes it easy to parse Web pages without relying on Nginx. Now we have finished building the C side interface of login, registration and chat:


4.3 Registration, Login, and Authentication

As mentioned earlier, for registration, login, and buddy management, we need a user table to store user information. Mysql > select * from ‘github.com/go-xorm/xorm’;

app/model/user.go

package model

import "time"

const (
	SexWomen = "W"
	SexMan = "M"
	SexUnknown = "U"
)

type User struct {
	Id         int64     `xorm:"pk autoincr bigint(64)" form:"id" json:"id"`
	Mobile   string 		`xorm:"varchar(20)" form:"mobile" json:"mobile"`
	Passwd       string	`xorm:"varchar(40)" form:"passwd" json:"-"'// User password MD5 (passwd + salt) Avatar string' xorm:"varchar(150)" form:"avatar" json:"avatar"`
	Sex        string	`xorm:"varchar(2)" form:"sex" json:"sex"`
	Nickname    string	`xorm:"varchar(20)" form:"nickname" json:"nickname"`
	Salt       string	`xorm:"varchar(10)" form:"salt" json:"-"`
	Online     int	`xorm:"int(10)" form:"online" json:"online"'// Online Token string' xorm:"varchar(40)" form:"token" json:"token"// User authentication Memo String xorm:"varchar(140)" form:"memo" json:"memo"`
	Createat   time.Time	`xorm:"datetime" form:"createat" json:"createat"'// create time, count user increments}Copy the code

Our user table stores some important information, such as user name, password, profile picture, user gender, mobile phone number and so on. More importantly, we also store token to indicate that after the user logs in, HTTP protocol is upgraded to Websocket protocol for authentication. We have mentioned this detail before, and the code demonstration will be shown below. Let’s take a look at some of the things that model initialization does:

app/model/init.go

package model

import (
	"errors"
	"fmt"
	_ "github.com/go-sql-driver/mysql"
	"github.com/go-xorm/xorm"
	"log"
)

var DbEngine *xorm.Engine

func init() {
	driverName := "mysql"
	dsnName := "Root: a root @ (127.0.0.1:3306)/chat? charset=utf8"
	err := errors.New("")
	DbEngine, err = xorm.NewEngine(driverName, dsnName)
	iferr ! = nil && err.Error() ! =""{
		log.Fatal(err)
	}
	DbEngine.ShowSQL(trueDbengine.sync (new(User), new(Community), new(Contact)) fmt.println (dbengine.setMaxOpenConns (10))"init database ok!")}Copy the code

We create a DbEngine global mysql connection object with a connection pool of size 10. The init function in the Model package is executed first when the program loads, as those familiar with the Go language should know. We also set some additional parameters to debug the program, such as setting the SQL in the print run and automatically synchronizing the tables, which can be turned off in the production environment. Our Model initialization is done, very rudimentary, and in a real project, configuration information like database username, password, connection number, and other information is recommended to be set in a configuration file and then read, unlike in this hard-coded program.

Registration is a normal API program, too simple for Go. Let’s look at the code:

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/controller/user.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. Func UserRegister(writer http.responsewriter, request * http.request) {var user model.user util.bind (request, &user) user, err := UserService.UserRegister(user.Mobile, user.Passwd, user.Nickname, user.Avatar, user.Sex)iferr ! = nil { util.RespFail(writer, err.Error()) }else {
		util.RespOk(writer, user, "")}}...# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/service/user.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #.typeUserService struct{} func (s *UserService) UserRegister(mobile, plainPwd, nickname, sex string) (user model.User, err error) { registerUser := model.User{} _, err = model.DbEngine.Where("mobile=? ", mobile).Get(&registerUser)
    iferr ! = nil {returnRegisterUser, err} // If the user is already registered, an error message is returnedif registerUser.Id > 0 {
		return registerUser, errors.New("The phone number is registered.")
	}

	registerUser.Mobile = mobile
	registerUser.Avatar = avatar
	registerUser.Nickname = nickname
	registerUser.Sex = sex
	registerUser.Salt = fmt.Sprintf("%06d", rand.Int31n(10000)) registerUser.Passwd = util.MakePasswd(plainPwd, Salt) registeruser.createat = time.now () // Insert user information _, err = model.dbengine.insertone (&registerUser)return registerUser,  err
}
......
# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//main.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. funcmain() {
    http.HandleFunc("/user/register", controller.UserRegister)
}
Copy the code

First, we use util.Bind(request, &user) to Bind user parameters to the user object, using the Bind function in util package. Readers can study the specific implementation details, mainly mimics the Gin framework parameter binding, which can be used in a convenient way. Then we search the database according to the user’s phone number to see if it already exists. If it does not exist, we will insert it into the database and return the registration success information. The logic is very simple. The logon logic is simpler:

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/controller/user.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. Func UserLogin(writer http.ResponseWriter, request *http.Request) { request.ParseForm() mobile := request.PostForm.Get("mobile")
	plainpwd := request.PostForm.Get("passwd") // Check parametersif len(mobile) == 0 || len(plainpwd) == 0 {
		util.RespFail(writer, "Incorrect username or password")
	}

	loginUser, err := UserService.Login(mobile, plainpwd)
	iferr ! = nil { util.RespFail(writer, err.Error()) }else {
		util.RespOk(writer, loginUser, "")}}...# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/service/user.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. func (s *UserService) Login(mobile, plainpwd string) (user model.User, Error) {loginUser := model.user {} model.dbengine."mobile = ?", mobile).Get(&loginUser)
	if loginUser.Id == 0 {
		return loginUser, errors.New("User does not exist"} // Check whether the password is correctif! util.ValidatePasswd(plainpwd, loginUser.Salt, loginUser.Passwd) {return loginUser, errors.New("Incorrect password"Token := util.genrandomstr (32) loginuser. token = token model.dbengine.id (loginuser.id).cols ("token").update (&loginuser) // Returns the new user informationreturn loginUser, nil
}
...
# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//main.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. funcmain() {
    http.HandleFunc("/user/login", controller.UserLogin)
}
Copy the code

Login logic is implemented, and next we go to the user home page, which lists the users, and click to enter the chat page. Users can also click the TAB bar below to check their group and enter the group chat page. The specific work also requires readers to develop their own user list, add friends, create groups, add groups and other functions, these are some common API development work, our code program has also been implemented, readers can take to modify the use, here will not demonstrate. Focus again we see the user authentication this piece, please, user authentication is when users click on the chat into the chat interface, the client sends a GET request to the server, the request to establish a websocket connection, the server receives a connection request, will be to check the client request, to do whether to establish a long connection, The handle to this long connection is then added to the map (since the server does not serve just one client, there may be thousands of long connections) to maintain. Let’s look at the specific code implementation:

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/controller/chat.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. // The core of this document is to form the mapping between userid and NodetypeDataQueue chan []byte GroupSets set.Interface}...... Var clientMap map[int64]*Node = make(map[int64]*Node, Func Chat(writer http.responsewriter) request *http.Request) { query := request.URL.Query() id := query.Get("id")
	token := query.Get("token") userId, _ := strconv.ParseInt(id, 10, 64) // Check whether the token is valid islegal := checkToken(userId, token) conn, err := (&websocket.Upgrader{ CheckOrigin: func(r *http.Request) bool {return islegal
		},
	}).Upgrade(writer, request, nil)

	iferr ! = nil { log.Println(err.Error())return} / / get the websocket conn node link: = & node {conn: conn, DataQueue: make (chan, byte [], 50), GroupSets: Set. The New (set as ThreadSafe), all} / / get the user group Id comIds: = concatService. SearchComunityIds (userId)for_, V := range comIds {node.groupsets.add (v)} rwlock. Lock() clientMap[userId] = node rwlock.unlock () // enable the coroutine processing to send logic go Sendproc (node) // Enable the coroutine to complete receiving logic go recvproc(node) sendMsg(userId, []byte("welcome!"))}... Func checkToken(userId int64, token String) bool {user := userService. Find(userId) func checkToken(userId int64, token String) bool {user := userService. Find(userId)return user.Token == token
}

......

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//main.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. funcmain() {
    http.HandleFunc("/chat", controller.Chat)
}
......
Copy the code

Enter the chat room, the client initiates /chat GET request, the server first creates a Node structure, used to store the websocket long connection handle established with the client, each handle has a pipeline DataQueue, used to send and receive information, GroupSets are the corresponding group information of the client. We’ll talk about that later.

typeDataQueue chan []byte groupset set.interface}Copy the code

The server creates a map that associates the client user ID with its Node:

Var clientMap map[int64]*Node = make(map[int64]*Node, 0)Copy the code

After receiving the parameters from the client, the server verifies the validity of the token to determine whether to upgrade HTTP to WebSocket and establish a long-term connection. This step is called authentication.

Islegal := checkToken(userId, token) conn, err := (&websocket.Upgrader{CheckOrigin: func(r *http.Request) bool {return islegal
		},
	}).Upgrade(writer, request, nil)
Copy the code

After the authentication succeeds, the server initializes a Node, searches for the group ID of the client user, and adds the id to the GroupSets property of the group. Add Node nodes to ClientMap for maintenance. We must lock the operation of ClientMap, because Go language does not guarantee atomic safety in the case of concurrent operation of map.

/ / get the websocket conn node link: = & node {conn: conn, DataQueue: make (chan, byte [], 50), GroupSets: Set. The New (set as ThreadSafe), all} / / get the user group Id comIds: = concatService. SearchComunityIds (userId)for _, v := range comIds {
		node.GroupSets.Add(v)
	}

	rwlocker.Lock()
	clientMap[userId] = node
	rwlocker.Unlock()
Copy the code

After a long link is established between the server and the client, two coroutines will be opened to deal with the sending and receiving of messages from the client. For Go language, the cost of maintaining coroutines is very low, so our standalone program can easily support thousands of users to chat, which is still not optimized.

. Send logic go SendProc (node) // Send logic go RecvProc (node) sendMsg(userId, []byte("welcome!"))...Copy the code

At this point, our authentication work has been completed, the connection between the client and the server has been established, and then we will implement the specific chat function.

4.4 Achieve single chat and group chat

In the process of realizing chat, the design of message body is very important. If the message body design is reasonable, the function expansion is very convenient, and the later maintenance and optimization are relatively simple. Let’s first look at the design of our message body:

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/controller/chat.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #
type Message struct {
	Id      int64  `json:"id,omitempty" form:"id"'// Message ID Userid int64' json:"userid,omitempty" form:"userid"'// who sent Cmd int' json:"cmd,omitempty" form:"cmd"'// Dstid int64' json:"dstid,omitempty" form:"dstid"'// Peer user ID/ group ID Media int' json:"media,omitempty" form:"media"'// What style is the message displayed in?"content,omitempty" form:"content"'// Message content Pic String' json:"pic,omitempty" form:"pic"'// Preview image Url string' json:"url,omitempty" form:"url"// The URL of the server Memo string 'json:"memo,omitempty" form:"memo"'// simple description of Amount int' json:"amount,omitempty" form:"amount"'// Other numbers related}Copy the code

Each message has a unique ID. We can persist messages in the future, but this is not done in our system. Readers can do this on their own as needed. And then the userID, the user who initiated the message, corresponds to the DSTID, to whom the message is to be sent. Another important parameter is CMD, which indicates whether a group or private chat is a group or private chat. The code handling logic for group and private chat is different, so we have defined some CMD constants for this purpose:

Const (CmdSingleMsg = 10 CmdRoomMsg = 11 CmdHeart = 0)Copy the code

Media is a type of media. We all know that wechat supports voice, video and various other file transfers. After we set this parameter, readers can expand these functions by themselves. Content is message text and is the most commonly used form of chat. PIC and URL are set up for images and other linked resources. Memo is a brief description, and amount is information related to numbers. For example, the red envelope business may use this field.

This is how the message body is designed. Based on this message body, let’s take a look at how the server sends and receives messages to achieve single chat and group chat. Starting from the previous section, we opened two coroutines for each client long link to send and receive messages, and the chat logic was implemented in these two coroutines.

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/controller/chat.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #. Func sendProc (node * node) {for {
		select {
		case data := <-node.DataQueue:
			err := node.Conn.WriteMessage(websocket.TextMessage, data)
			iferr ! = nil { log.Println(err.Error())return}}}} // receive logic func recvproc(node * node) {for {
		_, data, err := node.Conn.ReadMessage()
		iferr ! = nil { log.Println(err.Error())return} dispatch(data) //todo further processing of data fmt.printf ("recv<=%s", data) } } ...... Func dispatch(data []byte) {MSG := Message{} err := json.Unmarshal(data, & MSG)iferr ! = nil { log.Println(err.Error())return
	}
	switch msg.Cmd {
	case CmdSingleMsg:
		sendMsg(msg.Dstid, data)
	case CmdRoomMsg:
		for _, v := range clientMap {
			if v.GroupSets.Has(msg.Dstid) {
				v.DataQueue <- data
			}
		}
	caseCmdHeart: Func sendMsg(userId int64, MSG []byte) {lock.rlock () node, ok := clientMap[userId] rwlocker.RUnlock()if ok {
		node.DataQueue <- msg
	}
}
......
Copy the code

The logic of sending messages from the server to the client is relatively simple, that is, the message sent from the client is directly added to the channel of the target user Node. Websocket WriteMessage provides this functionality:

func sendproc(node *Node) {
	for {
		select {
		case data := <-node.DataQueue:
			err := node.Conn.WriteMessage(websocket.TextMessage, data)
			iferr ! = nil { log.Println(err.Error())return}}}}Copy the code

The sending and receiving logic is as follows: the server receives the user information through webSocket’s ReadMessage method and dispatches it through dispatch:

func recvproc(node *Node) {
	for {
		_, data, err := node.Conn.ReadMessage()
		iferr ! = nil { log.Println(err.Error())return} dispatch(data) //todo further processing of data fmt.printf ("recv<=%s", data)
	}
}
Copy the code

The Dispatch method does two things:

  • Parse the Message body into Message
  • Add the message body to the channel of different users or user groups according to the message type

The channel in Go language is a powerful tool for communication between coroutines. As long as dispatch adds the message to the channel, the sending coroutine will get the information and send it to the client, thus realizing the chat function. The difference between single chat and group chat is that the server will send the message to the group or the individual. The program traverses the entire clientMap to see which users are in the group and then sends the message. A better practice would be to maintain another Map of group and user relationships so that when sending group messages, retrieving user information is much less costly than traversing the entire clientMap.

func dispatch(data []byte) {
	msg := Message{}
	err := json.Unmarshal(data, &msg)
	iferr ! = nil { log.Println(err.Error())return
	}
	switch msg.Cmd {
	case CmdSingleMsg:
		sendMsg(msg.Dstid, data)
	case CmdRoomMsg:
		for _, v := range clientMap {
			if v.GroupSets.Has(msg.Dstid) {
				v.DataQueue <- data
			}
		}
	caseCmdHeart: // Detect client heartbeat}}...... func sendMsg(userId int64, msg []byte) { rwlocker.RLock() node, ok := clientMap[userId] rwlocker.RUnlock()if ok {
		node.DataQueue <- msg
	}
}
Copy the code

It can be seen that through channel, it is very convenient for us to realize the user chat function. The code is very readable and the program built is also very robust. Here’s a sketch of my local chat:


4.5 Sending emoticons and pictures

Let’s take a look at how emoticons and pictures, which are often used in chat, are implemented. In fact, emoticons are also small pictures, but different from pictures in chat, emoticons are relatively small and can be cached in the client or directly stored in the code file of the client code (but some emoticons in wechat chat are transmitted through the network). Here is the icon text data returned from a chat:

{
"dstid": 1,"cmd": 10,"userid": 2."media": 4."url":"/asset/plugins/doutu//emoj/2.gif"
}
Copy the code

When the client gets the URL, it loads the local little icon. It is the same principle for users to send pictures in chat, but the pictures of users in chat need to be uploaded to the server first, and then the server returns the URL, and then the client loads it. Our IM system also supports this function, let’s take a look at the picture uploading program:

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//app/controller/upload.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #
func init() {
	os.MkdirAll("./resource", os.ModePerm) } func FileUpload(writer http.ResponseWriter, request *http.Request) { UploadLocal(writer, Func UploadLocal(writer http.ResponseWriter, Request * http.request) {srcFile, head, err := request.formfile ("file")
	iferr ! = nil {util.respfail (writer, err.error ())} // Create a new file suffix :=".png"
	srcFilename := head.Filename
	splitMsg := strings.Split(srcFilename, ".")
	if len(splitMsg) > 1 {
		suffix = "." + splitMsg[len(splitMsg)-1]
	}
	filetype := request.FormValue("filetype")
	if len(filetype) > 0 {
		suffix = filetype
	}
	filename := fmt.Sprintf("%d%s%s", time.now ().unix (), util.genrandomstr (32), suffix) // Create file filepath :="./resource/" + filename
	dstfile, err := os.Create(filepath)
	iferr ! = nil { util.RespFail(writer, err.Error())return} // Copy the source file to the new file _, err = IO.Copy(dstFile, srcFile)iferr ! = nil { util.RespFail(writer, err.Error())return
	}

	util.RespOk(writer, filepath, "")}...# # # # # # # # # # # # # # # # # # # # # # # # # # # #
//main.go
# # # # # # # # # # # # # # # # # # # # # # # # # # # #
func main() {
    http.HandleFunc("/attach/upload", controller.FileUpload)
}
Copy the code

We store the file in a local disk folder and send it to the client path, where the client loads the relevant image information.

About sending pictures, although we achieve the function, but do too simple, we will discuss the system optimization related scheme in detail in the following chapter. How to make our system work better in production.

5. Program optimization and system architecture upgrade plan

We have implemented a functional IM system above, to apply the system in the production environment of the enterprise, need to optimize the code and system architecture, to achieve the real high availability. This section mainly talks about some personal opinions from code optimization and architecture upgrade. It is impossible to cover all aspects due to my limited ability. I hope readers can give more good suggestions in the comments section.

5.1 Code Optimization

Our code does not use a framework, and functions and APIS are written in a simple way. Although simple structure is carried out, a lot of logic is not decoupled. Therefore, we suggest you to reconstruct the code with a mature framework in the industry, Gin is a good choice.

ClientMap is used in system programs to store long link information of clients. In Go language, the read and write of large maps must be locked, which has certain performance limitations. In the case of a large number of users, readers can split clientMap, hash according to user IDS or adopt other strategies. You can also store these long link handles in Redis.

As mentioned above, there are many aspects that can be optimized in the process of picture uploading. The first one is picture compression (wechat also does this). The compression of picture resources can not only speed up the transmission speed, but also reduce the storage space of the server. In addition for the resources, in fact, the service side only need to store a copy of the data is enough, the reader can do it in the photo upload hash check, if the resource file already exists, there is no need to upload again, but the url directly returned to the client (each large network backup function is this vendor’s implementation)

There are many improvements in the code, such as better authentication, using WSS :// instead of WS ://, encrypting the message body in some security areas, and compressing the message body in high concurrency areas. The Mysql connection pool is optimized to store messages to Mongo persistently to avoid frequent writes to the database. Single writes are changed to multiple writes. In order to make the program consume less Cpu and reduce the number of Json encoding for the message body, use…… for multiple times

5.2 Upgrading the System Architecture

Our system is too simple and there is a lot of work to be done in the architecture upgrade, so I would like to mention only a few important points here:

  • Application/resource service separation

The resources we refer to are pictures, videos and other files. You can choose Cos of mature manufacturers or build your own file server. If the resources are large and users are widespread, CDN is a good choice.

  • Break through the number of system connections and build a distributed environment

For the server selection, the general choice will be Linux, Linux under everything is a file, long link is the same. The number of system connections in a single machine is limited. Generally speaking, it is very good to reach 100,000, so when the number of users increases to a certain program, it is necessary to build a distributed system. Distributed build to optimize the program, because the long link handle dispersed to different machines, the realization of message broadcast and distribution is the first problem to be solved, the author will not elaborate here, one is not enough experience, two is too much detail to discuss the solution. The problems of building distributed environment also include: how to flexibly expand capacity and deal with emergencies.

  • Separation of business functions

We have put user registration, adding friends and other functions together with chat functions. In real business scenarios, they can be separated. User registration, adding friends and creating groups can be placed on one server, and chat functions can be placed on another server. The separation of services not only makes the functional logic clearer, but also makes more efficient use of server resources.

  • Reduce database I/O and use caching wisely

Our system does not persist messages, user information to mysql. In business, if you want to make persistent storage of messages, you should consider the optimization of database I/O, simply speaking: merge database write times, optimize database read operations, and make reasonable use of cache.

The above is the author thought of some code optimization and architecture upgrade scheme.

6. Conclusion

I don’t know if you’ve noticed that using Go to build an IM system is much easier than using other languages, and has much more scalability and performance (not to brag about Go). In today’s era, 5G will be ubiquitous, data is no longer expensive, and IM systems have been widely infiltrated into users’ daily lives. For programmers, it is no longer difficult to build an IM system, if readers according to the ideas of this article, understand Websocket, Copy code, running procedures, it should not take half a day to get started such an IM system. IM system is an era, from QQ, wechat to the present artificial intelligence, are widely used in instant communication, around instant communication, and can do more product layout. The purpose of this article is to help more people understand IM, help some developers quickly build an application, kindle everyone’s interest in learning network programming knowledge, hope readers can gain, can apply IM system to more product layout.