Segmented Log (Segmented Log)

Divide large files into smaller files that are easier to work with.

The problem background

A single log file can grow to a large size and become a performance bottleneck when it is read at startup. Old logs need to be cleaned periodically, but cleaning up a large file can be taxing.

The solution

When a log reaches a certain size, it will be switched to a new file to continue writing.

Public Long writeEntry(WALEntry) {// Decide whether to write a new file maybeRoll(); Return openSegment. WriteEntry (entry); } private void maybeRoll() {// If the current file size exceeds the maximum log file size if (opensegment.size () >= config.getMaxLogSize()) {// Forcibly flush disks openSegment.flush(); SortedSavedSegments. Add (openSegment); / / get the last log file id long lastId = openSegment. GetLastLogEntryId (); OpenSegment = walsegment.open (lastId, config.getwaldir ()); }}Copy the code

If the log is shelled, you need a mechanism to quickly locate a file at a log location (or log sequence number). This can be done in two ways:

  • The name of each log sharding file contains a specific start and log location offset (or log sequence number)

  • Each log sequence number contains the file name and transaction offset.

    Public static String createFileName(Long startIndex) {// specific logPrefix _ start position _ log suffix return logPrefix + “” + startIndex + “” + logSuffix; }

    / / from the file name to extract log offset public static Long getBaseOffsetFromFileName (String fileName) {String [] nameAndSuffix = fileName.split(logSuffix); String[] prefixAndOffset = nameAndSuffix[0].split(“_”); if (prefixAndOffset[0].equals(logPrefix)) return Long.parseLong(prefixAndOffset[1]);

    return -1l;
    Copy the code

    }

After the file name contains this information, the read operation is divided into two steps:

  1. Given an offset (or transaction ID), a file larger than the offset log is retrieved

  2. All logs greater than this offset are read from the file

    // Given the offset, Read all log public List readFrom (Long startIndex) {List segments = getAllSegmentsContainingLogGreaterThan (startIndex); return readWalEntriesFrom(startIndex, segments); }

    // Given the offset, Get all the log files that contain greater than the offset private List getAllSegmentsContainingLogGreaterThan (Long startIndex) {List segments = new ArrayList<>(); //Start from the last segment to the first segment with starting offset less than startIndex //This will get all the segments which have log entries more than the startIndex for (int i = sortedSavedSegments.size() – 1; i >= 0; i–) { WALSegment walSegment = sortedSavedSegments.get(i); segments.add(walSegment);

    if (walSegment.getBaseOffset() <= startIndex) { break; // break for the first segment with baseoffset less than startIndex } } if (openSegment.getBaseOffset() <= startIndex) {  segments.add(openSegment); } return segments;Copy the code

    }

For example,

Almost all major MQ stores, such as RocketMQ, Kafka and BookKeeper, Pulsar’s underlying store, use segmented logging.

RocketMQ:

Kafka:

Pulsar storage implements BookKeeper:

In addition, segmented logging is commonly used for storage based on consistency protocols such as Paxos or Raft, such as Zookeeper and TiDB.

Wechat search “my programming meow **”, ** daily brush, easy to improve skills, won a variety of offers