If the cache is well designed, the service rarely fails

This article is modified and organized from the content of the third go-Zero live broadcast of “Go Open Source Talk”. The video content is long and divided into two parts. The content of this article has been deleted and reconstructed.

Hi everyone, I’m glad to come to GO Open Source talk to share some of the stories, design ideas and usage of open source projects. Today’s project is Go-Zero, a Web and RPC framework that integrates various engineering practices. I’m Kevin, author of Go-Zero, and my Github ID is Kevwan.

An overview of the go to zero

Although Go-Zero was opened on August 7, 2000, it has already been tested online on a large scale, which is also my accumulation of nearly 20 years of engineering experience. After it was opened, it received positive feedback from the community and received 6K stars in more than 5 months. It has repeatedly topped the github Go Language list of the day, week and month, and won gitee’s Most Valuable Project (GVP) and the Open Source China’s Most Popular Project of the Year. At the same time, the wechat community is very active, with a community group of more than 3000 people. Go-zero enthusiasts share their experience in using Go-Zero and discuss problems in using it.

How does Go-Zero automatically manage caches?

Cache Design Principles

We only delete the cache, we do not update, once the DB data changes, we will directly delete the corresponding cache, rather than update.

Let’s take a look at the correct order to delete the cache.

Delete the cache first, then update the DB

We see two concurrent requests, A request need to update the data, to delete the cache first, then B requests to read the data, the cache without the data, from the DB load data and write back cache, then update the DB, so at this point the cache data would have been dirty data, know that there are new updates cache expiration or data requests. As shown in figure

Update DB first, then delete cache

A requests to update DB first, and then B requests to read data. At this time, old data is returned. At this time, it can be considered that REQUEST A has not finished updating, and the final consistency is acceptable

Let’s look at the normal request flow again:

The first request updates the DB and removes the cache
The second request reads the cache, no data, reads the data from DB, and writes it back to the cache
Subsequent read requests can be read directly from the cache

Let’s take a look at the DB query. Suppose the row contains ABCDEFG columns:

A request to query only part of the column data, such as ABC, CDE or EFG, as shown in the figure

Query a single complete row, as shown in the figure

Query some or all columns of multiple row records, as shown

For the above three cases, firstly, we do not use partial query, because partial query cannot be cached. Once cached, data will be updated and it is impossible to locate which data needs to be deleted. Secondly, for multi-line query, according to the actual scenario and needs, we will establish the corresponding mapping from query conditions to primary keys in the business layer. Go-zero has full cache management built in for single-row full record queries. So the core principle is: Go-Zero must cache complete row records.

Let’s take a closer look at the cache handling for three scenarios built into Go-Zero:

Primary key-based caching
```
PRIMARY KEY (`id`)
Copy the code
```
This cache is relatively easy to handle and only requires redis to cache row records using the primary key as the key.
Caching based on unique indexes

When designing index-based cache, I refer to the design method of database index. In database design, when searching data by index, the engine will first find the primary key in the tree of index -> primary key, and then query row records by primary key. A layer of indirection was introduced to address the issue of indexing to row records. The same principle applies to go-Zero’s cache design.

Index-based caches are divided into single-column unique indexes and multi-column unique indexes:
- The single-column unique index is as follows:
```
UNIQUE KEY `product_idx` (`product`)
Copy the code
```
- Multi-column unique index as follows:
```
UNIQUE KEY `vendor_product_idx` (`vendor`, `product`)
Copy the code
```
But for Go-Zero, single-column and multi-column are just different ways of generating cache keys, and the control logic behind them is the same. Then go-Zero’s built-in cache management can better control data consistency problems, and also built-in to prevent cache breakdown, penetration, avalanche problems (these are discussed in detail in the gopherChina conference sharing, see the gopherChina sharing video).

In addition, Go-Zero has built-in cache traffic and access hit ratio statistics, as shown below:
```
Dbcache (SQLC) -QPM: 5057, hit_ratio: 99.7%, HIT: 5044, Miss: 13, db_fails: 0Copy the code
```
You can see more detailed statistics to analyze the cache usage, and in cases where the cache hit ratio is very low or the number of requests is very small, you can remove the cache, which also reduces the cost.

Cache code interpretation

1. Cache logic based on primary keys

The specific implementation code is as follows:

func (cc CachedConn) QueryRow(v interface{}, key string, query QueryFn) error {
  return cc.cache.Take(v, key, func(v interface{}) error {
    return query(cc.db, v)
  })
}
Copy the code

If you don’t get it, you can return it. If you don’t get it, you can get the whole row from DB and write it back to the cache. The whole logic is fairly straightforward.

Let’s look at the implementation of Take in detail:

func (c cacheNode) Take(v interface{}, key string, query func(v interface{}) error) error {
  return c.doTake(v, key, query, func(v interface{}) error {
    return c.SetCache(key, v)
  })
}
Copy the code

The logic of Take is as follows:

withkeyLook up data from the cache
If found, the data is returned
If you can’t find it, usequeryMethod to read data
Call after readingc.SetCache(key, v)Set the cache

The doTake code and explanation are as follows:

// v - The data object to read
// key - Cache key
// query - The method used to read the full data from DB
// cacheVal - The method used to write to the cache
func (c cacheNode) doTake(v interface{}, key string, query func(v interface{}) error.
  cacheVal func(v interface{}) error) error {
  // Barrier is used to prevent cache breakdown and ensure that only one request is required to load the data corresponding to the key
  val, fresh, err := c.barrier.DoEx(key, func(a) (interface{}, error) {
    // Read data from cache
    iferr := c.doGetCache(key, v); err ! =nil {
      // If the placeholder is placed in the placeholder (to prevent cache penetration), then the default errNotFound is returned
      // If it is an unknown error, then it will return directly, because we can not abandon the cache error and directly send all requests to DB,
      // This will break DB in high concurrency scenarios
      if err == errPlaceholder {
        return nil, c.errNotFound
      } else iferr ! = c.errNotFound {// why we just return the error instead of query from db,
        // because we don't allow the disaster pass to the DBs.
        // fail fast, in case we bring down the dbs.
        return nil, err
      }

      / / request DB
      // If the error returned is errNotFound, then we need to set the placeholder in the cache to prevent cache penetration
      if err = query(v); err == c.errNotFound {
        iferr = c.setCacheWithNotFound(key); err ! =nil {
          logx.Error(err)
        }

        return nil, c.errNotFound
      } else iferr ! =nil {
        // Failed to collect database statistics
        c.stat.IncrementDbFails()
        return nil, err
      }

      // Write data to the cache
      iferr = cacheVal(v); err ! =nil {
        logx.Error(err)
      }
    }
    
    // Returns json serialized data
    return jsonx.Marshal(v)
  })
  iferr ! =nil {
    return err
  }
  if fresh {
    return nil
  }

  // got the result from previous ongoing query
  c.stat.IncrementTotal()
  c.stat.IncrementHit()

  // Write data to the incoming v object
  return jsonx.Unmarshal(val.([]byte), v)
}
Copy the code

2. Cache logic based on unique indexes

Because this is a bit complicated, I’ve color-coded the code block and the logic of the response. Block 2 is actually the same as primary key-based caching, so I’m going to focus on the logic of block 1.

The block 1 part of a code block is divided into two cases:

The primary key can be found from the cache through the index

The primary key is now used to override block 2’s logic, following the primary key-based caching logic above
The primary key cannot be found from the cache by index
- Query complete row records from DB by index, if anyerrorTo return to
- When the full row record is found, both the primary key to full row record cache and the index to primary key cache are writtenredis 里
- Returns the desired row record data

// v - The data object to read
// key - The cache key generated by the index
// keyer - a method of generating a key based on the primary key cache from a primary key
// indexQuery - A method of reading complete data from the DB using an index that returns the primary key
// primaryQuery - Method of getting full data from DB with primary key
func (cc CachedConn) QueryRowIndex(v interface{}, key string, keyer func(primary interface{}) string.
  indexQuery IndexQueryFn, primaryQuery PrimaryQueryFn) error {
  var primaryKey interface{}
  var found bool

  // Query the cache through the index to see if there is a cache to the primary key
  if err := cc.cache.TakeWithExpire(&primaryKey, key, func(val interface{}, expire time.Duration) (err error) {
    // If there is no index to primary key cache, then the full data is queried by index
    primaryKey, err = indexQuery(cc.db, v)
    iferr ! =nil {
      return
    }

    // Set the value of "found" to "found", and do not need to read data from cache
    found = true
    // Save the primary key to full data mapping in the cache. The takeWithepire method already saves the index to primary key mapping in the cache
    returncc.cache.SetCacheWithExpire(keyer(primaryKey), v, expire+cacheSafeGapBetweenIndexAndPrimary) }); err ! =nil {
    return err
  }

  // The index has already been found
  if found {
    return nil
  }

  // Read data from the cache using the primary key, or if the cache is not there, read data from the DB using the primaryQuery method, write back to the cache and return data
  return cc.cache.Take(v, keyer(primaryKey), func(v interface{}) error {
    return primaryQuery(cc.db, v, primaryKey)
  })
}
Copy the code

Let’s look at a practical example

func (m *defaultUserModel) FindOneByUser(user string) (*User, error) {
  var resp User
  // Generate an index-based key
  indexKey := fmt.Sprintf("%s%v", cacheUserPrefix, user)
  
  err := m.QueryRowIndex(&resp, indexKey,
    // Generate a full data cache key based on the primary key
    func(primary interface{}) string {
      return fmt.Sprintf("user#%v", primary)
    },
    // DB query method based on index
    func(conn sqlx.SqlConn, v interface{}) (i interface{}, e error) {
      query := fmt.Sprintf("select %s from %s where user = ? limit 1", userRows, m.table)
      iferr := conn.QueryRow(&resp, query, user); err ! =nil {
        return nil, err
      }
      return resp.Id, nil
    },
    // DB query method based on primary key
    func(conn sqlx.SqlConn, v, primary interface{}) error {
      query := fmt.Sprintf("select %s from %s where id = ?", userRows, m.table)
      return conn.QueryRow(&resp, query, primary)
    })
  
  Sqlc. ErrNotFound is returned. If so, we use the ErrNotFound defined in this package
  // Avoid user perception of cache use and also isolate the underlying dependencies
  switch err {
    case nil:
      return &resp, nil
    case sqlc.ErrNotFound:
      return nil, ErrNotFound
    default:
      return nil, err
  }
}
Copy the code

All the automatic management of cache code above can be generated automatically by the goctl, our team internal basic CRUD and cache are generated automatically by the goctl, can save a lot of development time, and cache the code itself is very easy to get wrong, even if have a very good code experience, also it is difficult to completely write to every time, So we recommend using automated cache code generation tools whenever possible to avoid errors.

Need more?

If you want to get a better feel for the Go-Zero project, head over to the official website to learn about specific examples.

Video Playback Address

www.bilibili.com/video/BV1Jy…

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!