Analysis of wechat high-performance online log system Xlog

Students who are engaged in mobile development often encounter a headache problem, that is, when users give feedback on some problems, which are relatively rare and difficult to repeat (not Crash), they will often be at a loss. Therefore, many people have developed related monitoring systems, such as some well-known APM to monitor frame rate, memory, power and so on, and collect, merge and report these data to a special platform for students to check. However, these APMs tend to be coarse-grained monitoring. The reason is that if the monitoring is very detailed, the online performance will suffer, and some monitoring will affect the normal use of users.

Having said that, despite the difficulty of getting the data, the essence of online monitoring is information (logging) recording, and online logging has a primary socialist supply-demand problem:

That is, real-time fine-grained logging cannot balance the performance gap with log integrity.

If you want high-performance, fine-grained logging, you’re going to use a lot of memory. And a lot of use of memory, in case the power runs out, the program suddenly crashed, these intermediate state of the log has not persisted, it is equivalent to a waste of energy; If you want to be reliable, you need to be constantly dropping in real time. We know that the behavior of writing to disk is designed to switch between user mode and kernel mode, which will definitely affect performance under the requirements of high fluency, and this is not a problem that you can solve with multi-threading.

Why is writing to disk very slow

Today, almost all operating systems in memory management, the basic use of page management strategy. Replace contiguous memory space (note space, not address) with page size. There are several advantages to this:

  1. Managing by page size can effectively reduce the granularity of memory fragmentation.
  2. By page loading, you can make full use of swap space on disk, so that the space used by the program can greatly exceed the memory limit.

Of course, there is no swap space on iOS devices, but memory is still managed as a page structure.

Back to why writing to disk is slow. We typically persist data in memory to disk. Data is flushed from memory back to disk in one of the following ways:

  1. Through the pages offlagMarked as changed, the operating system periodically writes the dirty pages back to disk at uncontrollable times.
  2. Call user-mode write interface -> trigger kernel-modesys_write-> The file system writes data back to the disk.

At first glance, this second approach may seem like a good fit for journaling, but it has two obvious problems:

  • File systems at efficiency do not immediately write data back to disk (for example, track addressing is relatively time-consuming due to mechanical reasons), but rather cache data as blocks in queues that are sorted, merged and then written back to disk.
  • This method requires two copies before writing data back to disk. One is to copy data from the user state to the kernel state, need to undergo a context switch; Another was a real copy of the data from the kernel space to the hard disk. When switching is too frequent, overall performance deteriorates.

Based on the above problems, Xlog adopts mMAP scheme to design the log system:

Mmap uses logical memory to map disk files without any copy operation, avoiding data copy of written files. Manipulating memory is like manipulating files, avoiding frequent switching between kernel space and user space.

In addition to the system capabilities, using MMAP can also ensure log integrity, as the following situations automatically write back to disk:

  • Out of memory
  • Process crash
  • Call msync or munmap
  • 30s-60s without MAP_NOSYNC (FreeBSD only)

Xlog source code analysis

The xlog code is split into two main pieces, and the upper-level use of the wrapper xLogger exposes a number of excuses. Core appenders and logs, etc.

log_buffer

The log_buffer is intended to encapsulate a data structure for MMAP/traditional memory operations. The core idea is to translate the upper level operation to read and write the actual log cache address (also encapsulate the encryption compression operation and so on). Let’s take a write operation as an example:

Bool LogBuffer::Write(const void* _data, size_t _length) {// Some exceptions, Don't say the if (NULL = = _data while forming | | 0 = = _length) {return false. } if (buff_.Length() == 0) { if (! __Reset()) return false; } size_t before_len = buff_.Length(); size_t write_len = _length; If (is_compress_) {cstream_avail_in = (uInt)_length; cstream_.next_in = (Bytef*)_data; uInt avail_out = (uInt)(buff_.MaxLength() - buff_.Length()); cstream_.next_out = (Bytef*)buff_.PosPtr(); cstream_.avail_out = avail_out; if (Z_OK ! = deflate(&cstream_, Z_SYNC_FLUSH)) { return false; } write_len = avail_out - cstream_.avail_out; } else {// 1. Write data to mmap file or memory buff_.Write(_data, _length); Before_len -= remain_nocrypt_len_;} // 2. Check length of failed data before attempted encryption. AutoBuffer out_buffer; size_t last_remain_len = remain_nocrypt_len_; // 3. Log_crypt_ ->CryptAsyncLog((char*) buff_.ptr () + before_len, write_len + remain_nocrypt_len_, out_buffer, remain_nocrypt_len_); // 4. Write the encrypted text back to the end of the last encrypted data buff_.write (out_buffer.ptr (), out_buffer.length (), before_len); // 5. update data before_len += out_buffer.length (); before_len += out_buffer.length (); buff_.Length(before_len, before_len); // 6. Add some auxiliary information such as the length of the encryption to supplement the real data. Log_crypt_ ->UpdateLogLen((char*) buff_.ptr (), (uint32_t)(out_buffer.length () - last_remain_len)); return true; }Copy the code

It is not hard to see that the whole is to encrypt the written data, if there is a need for compression at the same time. The modified data is stored in a real MMAP file/memory cache.

If you don’t understand, look at this picture I drew to illustrate:

appender

The real core of the Xlog solution is actually just an appender file, which is pretty clear in nature, separating the added log into synchronous and asynchronous writes. The asynchronous write method is commonly used, and the following analysis is based on this method.

The first is the initial configuration of the logging system

assert(_dir); assert(_nameprefix); if (! sg_log_close) { __writetips2file("appender has already been opened. _dir:%s _nameprefix:%s", _dir, _nameprefix); return; Xlogger_appender (&xlogger_appender); //mkdir(_dir, S_IRWXU|S_IRWXG|S_IRWXO); boost::filesystem::create_directories(_dir); tickcount_t tick; tick.gettickcount(); __del_timeout_file(_dir); tickcountdiff_t del_timeout_file_time = tickcount_t().gettickcount() - tick; tick.gettickcount(); char mmap_file_path[512] = {0}; snprintf(mmap_file_path, sizeof(mmap_file_path), "%s/%s.mmap2", sg_cache_logdir.empty()? _dir:sg_cache_logdir.c_str(), _nameprefix); bool use_mmap = false; // 2. Try mmap if (OpenMmapFile(mmap_file_path, kBufferBlockLength, sg_mmmap_file)) { sg_log_buff = new LogBuffer(sg_mmmap_file.data(), kBufferBlockLength, true, _pub_key); use_mmap = true; Char * buffer = new char[kBufferBlockLength]; sg_log_buff = new LogBuffer(buffer, kBufferBlockLength, true, _pub_key); use_mmap = false; } 4 if (NULL == sg_log_buff->GetData().Ptr()) { if (use_mmap && sg_mmmap_file.is_open()) CloseMmapFile(sg_mmmap_file); return; } 2 AutoBuffer buffer; sg_log_buff->Flush(buffer); ScopedLock lock(sg_mutex_log_file); sg_logdir = _dir; sg_logfileprefix = _nameprefix; sg_log_close = false; appender_setmode(_mode); lock.unlock(); char mark_info[512] = {0}; get_mark_info(mark_info, sizeof(mark_info)); if (buffer.Ptr()) { __writetips2file("~~~~~ begin of mmap ~~~~~\n"); __log2file(buffer.Ptr(), buffer.Length()); __writetips2file("~~~~~ end of mmap ~~~~~%s\n", mark_info); } 6. Add some information about xlog itself tickCountDiff_t get_mmAP_time = tickcount_t().gettickcount() -tick; char appender_info[728] = {0}; snprintf(appender_info, sizeof(appender_info), "^^^^^^^^^^" __DATE__ "^^^" __TIME__ "^^^^^^^^^^%s", mark_info); xlogger_appender(NULL, appender_info); char logmsg[64] = {0}; snprintf(logmsg, sizeof(logmsg), "del time out files time: %" PRIu64, (int64_t)del_timeout_file_time); xlogger_appender(NULL, logmsg); snprintf(logmsg, sizeof(logmsg), "get mmap time: %" PRIu64, (int64_t)get_mmap_time); xlogger_appender(NULL, logmsg); xlogger_appender(NULL, "MARS_URL: " MARS_URL); xlogger_appender(NULL, "MARS_PATH: " MARS_PATH); xlogger_appender(NULL, "MARS_REVISION: " MARS_REVISION); xlogger_appender(NULL, "MARS_BUILD_TIME: " MARS_BUILD_TIME); xlogger_appender(NULL, "MARS_BUILD_JOB: " MARS_TAG); snprintf(logmsg, sizeof(logmsg), "log appender mode:%d, use mmap:%d", (int)_mode, use_mmap); xlogger_appender(NULL, logmsg); BOOT_RUN_EXIT(appender_close);Copy the code

There are a few points to note:

  • Note 1: If we try to open mmap successfully, but the corresponding data address of Mmap is NULL, then we must stop mapping. Because the address represented by NULL is in kernel state, once mapped, it is bound to cause Crash.
  • Note 2: In the case of mMAP, if the last application power failure or Crash, the log information still exists, but it may not be timely converted into the desired log file. So we first check if there is any data in the Mmap file, and then convert it to a log.

Logs added by the upper layer are called by xlogger_appender and then logged by __appender_async.

__appender_async

__appender_async needs to be paired with its asynchronous dump thread and is two very interesting pieces of code that involve a strategy for writing MMAP/memory data back to disk.

The first is to add a log:

static void __appender_async(const XLoggerInfo* _info, const char* _log) { ScopedLock lock(sg_mutex_buffer_async); if (NULL == sg_log_buff) return; char temp[16*1024] = {0}; //tell perry,ray if you want modify size. PtrBuffer log_buff(temp, 0, sizeof(temp)); log_formater(_info, _log, log_buff); if (sg_log_buff->GetData().Length() >= kBufferBlockLength*4/5) { int ret = snprintf(temp, sizeof(temp), "[F][ sg_buffer_async.Length() >= BUFFER_BLOCK_LENTH*4/5, len: %d\n", (int)sg_log_buff->GetData().Length()); log_buff.Length(ret, ret); } if (!sg_log_buff->Write(log_buff.Ptr(), If (sg_log_buff->GetData().length ()) >= (sg_log_buff->GetData().length ()) >= (sg_log_buff->GetData() kBufferBlockLength*1/3 || (NULL!=_info && kLevelFatal == _info->level)) { sg_cond_buffer_async.notifyAll(); } }Copy the code

The second is asynchronous threads that Dump logs

static void __async_log_thread() {
    while (true) {

        ScopedLock lock_buffer(sg_mutex_buffer_async);

        if (NULL == sg_log_buff) break;

        AutoBuffer tmp;
        sg_log_buff->Flush(tmp);
        lock_buffer.unlock();

        if (NULL != tmp.Ptr())  __log2file(tmp.Ptr(), tmp.Length());

        if (sg_log_close) break;

        sg_cond_buffer_async.wait(15 * 60 *1000);
    }
}
Copy the code

As you can see, the main strategy for the entire log is to use Mmap to write the log to the disk map, notifying the asynchronous thread to write the log when more than a third of the time.

In this way, the real-time and integrity of MMAP is utilized to create a log with very clear logic and easy to understand. The overall architecture is shown as follows: