The problem

Perhaps due to my lack of experience, I often encounter problems at work, and I always want to record the process of exploring and solving problems. Therefore, MY blog is always problem-driven. First, I would like to introduce the problems to be solved today:

Service coupling

We may encounter situations like this during development:

  • A process depends on a service, so you couple the service to the process code;
  • Service initialization takes a long time, which slows down the process startup time.
  • The service takes up a lot of memory to run, and the memory loss is serious in the case of multiple processes.

As my previous article hours to minutes – step by step optimization of huge amount of keyword matching is introduced in the text matching service, it is one part of the message processing, by multiple message processing process, each time the initialization process takes 6 seconds Trie tree structure, and the service reads keywords large files, use the group structure Trie tree, It takes up a lot of memory (currently set to 256MB).

I’ve written the processes as daemons and made them run for a long time. I don’t have to worry much about initialization time, but I can’t do anything about the huge memory footprint. With a larger number of keywords, a machine running a dozen or so message processing can’t do anything else.

Also, what if THERE was a need for me to encapsulate the text matching service as an interface for external calls? As we know, in Web services, the life cycle of each request processing process is from receiving the request to the end of the response. If each request uses a large amount of memory and time to initialize the service, then the interface response time and server stress can be imagined.

Service to extract

Thus, the service form must be changed, and we want the text to match the service to do so:

  • Go with the flow, not dependent, no longer coupled to “message processing services”;
  • Once initialized, the process continues to provide services while running;
  • Synchronous response, efficient and accurate, preferably without various locks to maintain resource possession;

The solution is simply to extract the text matching service and run it as a single daemon, like a special server, that multiple “message processing services” can call when needed.

Now we need to consider how the text matching service process communicates with the outside world, accepting matching requests, and responding to matching results. Again, the problem comes back to interprocess communication.


Unix Domain Sockets

Interprocess communication

Inter-process Communication (IPC) refers to some techniques or methods for transmitting data or signals between at least two processes or threads. A process is the smallest unit of resources allocated by a computer system (strictly speaking, threads). Each process has its own set of independent system resources that are isolated from each other. Interprocess communication is created to enable different processes to access resources and coordinate their efforts.

There are many ways to communicate between processes, and there are also many online introductions to this. The following is an analysis of these ways according to the requirements of the article:

  • Pipes: Pipes are the original IPC form of Unix, but they can only be used for processes that have a common ancestor, not for processes that are not related. If you use it, you need to start the Text Matching service in the Message Processing Service, which is not much different from the original.
  • Named pipe: Also known as named pipe, it is called in UnixFIFO, it carries out data interaction between processes through a file, but when serving multiple processes, it needs to add locks to ensure atomicity, so as to avoid mismatch between write and read.
  • Signals and semaphores: used for process/thread event level communication, but they communicate too little information.
  • Message queues and shared memory: Both communicate over a common memory medium, and I’ve previously written about using message queues and shared memory for communication between PHP processes: However, they are all asynchronous in communication, and it is impossible to distinguish the request and the corresponding response information when dealing with multiple processes.
  • Socket: Communicates through uniX-wrapped network apis, such as databases and servers, which can also provide local services. However, network sockets can be used, but they are not a perfect choice because of the overhead of data wrapping and network calls.

A simple introduction

Unix Domain Sockets, which can be understood as special Sockets, but it doesn’t need to go through the network protocol stack, pack and unpack, compute checksums, maintain serial numbers, acknowledge, etc. Only the application layer data is copied from one process to another, so communication within the system is more efficient. And without network problems, it also ensures better message integrity, neither lost nor out of order.

As a special Socket, it is created and invoked in the same way as a network Socket. For a complete interaction, the server must go through create, bind, listen, Accept, read, and write, and the client must go through create, connect, write, and read. Unlike a normal Socket, it is bound to a system file instead of an IP or port.

Create code here is no longer introduced, before an article with C to write a Web server (a) basic function of the function of the implementation section in detail introduced the specific steps of socket communication, C language are similar, it is easy to understand.

Applicable scenario

Unix Domain Sockets is really a heavy weapon of interprocess communication, with it can quickly realize data, information interaction between processes, and do not need to lock complex operations, also do not need to consider efficiency, is simple and efficient.

Of course, “heavy weapons” are not suitable for various situations. Unix Domain Sockets are suitable for the following scenarios:

  • The service exists for a long time. The server side of Unix Domain Sockets is a server-like presence in the daemon that blocks and waits for a client connection to be fully exploited.
  • One server with multiple clients. It can distinguish different clients through the file descriptor of Socket and avoid the lock operation between resources.
  • Within the same system. It can only replicate process data within the same system, using traditional Sockets across systems.

Code implementation

The next step is to show code, but as anyone who studies PHP knows, PHP is not suitable for processing CPU-intensive tasks. I happened to learn a little Go, and I realized the next Trie tree with Go, so it involved the communication between PHP and Go. There is today’s article. Of course, the method introduced is not only suitable for PHP and Go communication, other languages can also, at least in C language is common.

The complete code is in the IPC-Github Pillow book, which also includes a handy PHP version of Unix Domain Sockets Server.

Go implementation of the Trie tree

Instead of focusing on Trie trees, here are data structures and points to note.

Type Node struct {depth int children map[int32]Node}Copy the code

Note:

  • Using slice ofappend()When the function stores the incremented matching result, it is possible to reassign the address due to the insufficient slice capacity. Therefore, the address of slice should be passed to store the incremented matching result.*result = append(*result, word)Finally, the incremented slice address is passed back.
  • Because the coding in Go is uniformly usedutf-8, do not judge character boundaries as PHP does, so in the case of keyword unpacking and message unpacking, directly usedint32()Method converts both keywords and messages to membersint32Type of slice, used during matchingint32Type number to represent the Chinese character, to be used after matchingfmt.Printf("%c", int32)Convert it to Chinese.

Go Server

Go to create a socket and use the steps are very simple, but Go does not have exceptions, error judgment will be more disgusting, I do not know if there is a better way to write. Now for simplicity, error is left blank.

    // 创建一个Unix domain soceket
    socket, _ := net.Listen("unix", "/tmp/keyword_match.sock")
    // 关闭时删除绑定的文件
    defer syscall.Unlink("/tmp/keyword_match.sock") 
    // 无限循环监听和受理客户端请求
    for {
        client, _ := socket.Accept()
        
        buf := make([]byte, 1024)
        data_len, _ := client.Read(buf)
        data := buf[0:data_len]
        msg := string(data)
        
        matched := trie.Match(tree, msg)
        response := []byte("[]") // 给响应一个默认值
        if len(matched) > 0 {
            json_str, _ := json.Marshal(matched)
            response = []byte(string(json_str))
        }
        _, _ = client.Write(response)
    }Copy the code

PHP Client

Here is the PHP implementation client:

$msg = "msg"; $socket = socket_create(AF_UNIX, SOCK_STREAM, 0); socket_connect($socket, '/tmp/keyword_match.sock'); socket_send($socket, $msg, strlen($msg), 0); $response = socket_read($socket, 1024); socket_close($socket); If (strlen($response) > 3) {var_dump($response); }Copy the code

summary

The efficiency of

Here’s a summary of the efficiency of the design:

For text keyword matching using Go alone, a thousand pieces of data run over a second, almost twice as fast as PHP. But what about 8x efficiency? Sure enough, the tests were all lies. Of course, it could be that I’m writing something wrong or that the Trie tree is outside the scope of Go. Then there is the time it takes PHP to call the Go service using Unix Domain sockets, either because it takes time to copy data between processes or because PHP is dragging its feet, which is a little over 3 seconds, similar to a pure PHP script.

gossip

As anyone who uses PHP knows, because of the nature of the interpreted language and its high degree of encapsulation, PHP is fast to develop, but performs slightly worse than other languages. Facebook has HHVM, PHP7 has opCache, and it is said that it will add JIT to PHP8 to compensate for its congenital defects.

However, developers, especially those like me who are obsessed with efficiency, might be better off learning about the new features of PHP and writing high-computation, single-logic code in a language that is more efficient and less efficient.

So, after thinking about it for a long time and watching the various arguments between Go’s supporters and opponents, I decided to give Google Dad the benefit of the doubt. After all, there was no other language I thought I could choose. PS: Please don’t comment on this paragraph, thank you:)

In addition, C, although temporarily used in development, but after all is the origin of contemporary N multi-language, occasionally write data structure, algorithm and so on to avoid rust. And learned some C, from PHP to Go, switch up a little bit more handy feeling ~

If you have any questions about this article, please leave a comment below. If you think this article is helpful to you, you can click the recommendation below to support me. The blog has been updated, welcome to follow.