Mooring floating purpose in this paper, starting from Jane books: www.jianshu.com/u/204b8aaab…

version The date of note
1.0 2020.4.19 The article first
1.1 2021.6.23 The title fromZookeeper: What about BadVersionExceptionChange forZookeeper source code (five) : BadVersionException in the end is what

preface

Recently, I occasionally observed that ZK reported BadVersionException in the development, and then learned that it was optimistic lock related problem in the search cause, and solved the problem quickly. Regardless of whether a single application or distributed system is running, there must always be a mechanism to ensure data exclusivity. Let’s take a look at how ZK implements this mechanism.

Node properties

Before we can analyze the source code, we need to understand the three version attributes of the ZK node:

  1. Version: indicates the version number of the current data node
  2. Cversion: indicates the version of the current data child node
  3. Uncomfortable: VERSION number of the ACL change of the current data node

All of these properties are now available in the StatPersisted class.

When the associated attribute is changed, the version number is +1. If the version number of a newly created node is 0, it indicates that the node has been updated 0 times.

Source code analysis

Normally if we call setData, the code will look like this:

/ / Curator version
// Require version comparisons. Of course, the -1 server will not compare when receiving
client.setData().withVersion(version).forPath(path, payload);
// Version comparison is not required
client.setData().forPath(path, payload);
Copy the code

Zookeeper’s client code is very simple:

    /**
     * The asynchronous version of setData.
     *
     * @see #setData(String, byte[], int)
     */
    public void setData(final String path, byte data[], int version,
            StatCallback cb, Object ctx)
    {
        final String clientPath = path;
        PathUtils.validatePath(clientPath);

        final String serverPath = prependChroot(clientPath);

        RequestHeader h = new RequestHeader();
        h.setType(ZooDefs.OpCode.setData);
        SetDataRequest request = new SetDataRequest();
        request.setPath(serverPath);
        request.setData(data);
        request.setVersion(version);
        SetDataResponse response = new SetDataResponse();
        cnxn.queuePacket(h, new ReplyHeader(), request, response, cb,
                clientPath, serverPath, ctx, null);
    }
Copy the code

The version attribute is then serialized into the request and sent to the server.

Take a look at the server-side code. From the name of the anomaly, we can easily find the code PrepRequestProcessor. PRequest2Txn code:

       case OpCode.setData:
                zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
                SetDataRequest setDataRequest = (SetDataRequest)record;
                if(deserialize)
                    ByteBufferInputStream.byteBuffer2Record(request.request, setDataRequest);
                path = setDataRequest.getPath();
                validatePath(path, request.sessionId);
                nodeRecord = getRecordForPath(path);
                checkACL(zks, nodeRecord.acl, ZooDefs.Perms.WRITE, request.authInfo);
                int newVersion = checkAndIncVersion(nodeRecord.stat.getVersion(), setDataRequest.getVersion(), path);
                request.setTxn(new SetDataTxn(path, setDataRequest.getData(), newVersion));
                nodeRecord = nodeRecord.duplicate(request.getHdr().getZxid());
                nodeRecord.stat.setVersion(newVersion);
                addChangeRecord(nodeRecord);
                break;
Copy the code

Let’s look at the checkAndIncVersion logic:

    private static int checkAndIncVersion(int currentVersion, int expectedVersion, String path)
            throws KeeperException.BadVersionException {
        if(expectedVersion ! = -1&& expectedVersion ! = currentVersion) {throw new KeeperException.BadVersionException(path);
        }
        return currentVersion + 1;
    }
Copy the code

The code is straightforward: take the version of the node from ZK and compare it with the current version of the node if the request requires a comparison (not -1 by the client).

If no exception is thrown, the version number is +1 and the update is committed to the queue, which is finally updated to zK’s in-memory database.

Obviously, this is an implementation of CAS technology. So why implement locks based on CAS? Before we do that, we need to review the scenarios for optimistic and pessimistic locking:

  • Pessimistic locking: Suitable for scenarios where there is a lot of competition for data updates. Because of its strong exclusivity and exclusive characteristics.
  • Optimistic locking: Suitable for scenarios where data concurrency competition is low and transaction conflicts are low. It does not rely on exclusivity to implement locks, and the more common implementation is CAS, which we just mentioned.

As we all know, ZK is commonly used for configuration management, DNS services, distributed collaboration, and group membership management, which means less concurrent data contention, and transactions are actually handled serially by the Leader server. Obviously, this fits into the optimistic locking usage scenario, so ZK does not use the “clunky” pessimistic locking to implement atomic operations on distributed data.

summary

In this article, we learned that ZK’s data exclusivity mechanism implementation is optimistic locking. The reason for this design is that data concurrency is less competitive in typical ZK usage scenarios (of course, you can make it competitive, but the overall process becomes more time consuming), and transactions are executed sequentially.