HTTP persistent links and HttpClient link pools

The background,

HTTP is a stateless protocol, that is, each request is independent of each other. So its initial implementation was that each HTTP request would open a TCP socket connection, which would be closed when the interaction was complete.

HTTP is a full-duplex protocol, so establishing and disconnecting a connection requires three handshakes and four waves. Obviously in this design, each Http request consumes a lot of additional resources, namely connection establishment and destruction.

Therefore, HTTP protocol has also been developed, through the persistent connection method to reuse socket connections.

As can be seen from the figure:

In a serial connection, each interaction opens and closes the connection
In a persistent connection, the first interaction opens the connection, and the connection is not closed after the interaction, eliminating the need to establish the connection for the next interaction.

There are two implementations of persistent connections: HTTP/1.0+ keep-alive and HTTP/1.1 persistent connections.

HTTP/1.0+ keep-alive

Since 1996, many HTTP/1.0 browsers and servers have extended the protocol, known as the “keep-alive” extension.

Note that this extended protocol appears as an “experimental persistent connection” in addition to 1.0. Keep-alive is no longer in use and is not addressed in the latest HTTP/1.1 specification, although many applications continue.

Clients using HTTP/1.0 add “Connection: keep-alive “to the header, asking the server to Keep a Connection open. If the server wishes to keep the connection open, it will include the same header in the response. If the response does not contain the “Connection: keep-alive “header, the client considers that the server does not support keep-alive and closes the current Connection after sending the response packet.

The keep-alive protocol is used to complement the persistent connection between the client and server. However, there are still some problems:

Keep-alive is not a standard protocol in HTTP/1.0. The client must send Connection: keep-alive to activate the keep-alive Connection.
Proxy servers may not be able to support keep-alive because some proxies are “blind relays”, unable to understand the meaning of the header and simply forward it hop by hop. It is possible that both the client and the server remain connected, but the proxy does not accept the connected data.

HTTP/1.1 persistent connection

HTTP/1.1 replaces keep-alive with a persistent connection.

HTTP/1.1 connections are persistent by default. To explicitly Close the packet, add the Connection:Close header to the packet. In HTTP/1.1, all connections are multiplexed.

However, as with keep-alive, idle persistent connections can be closed by both clients and servers at any time. Not sending Connection:Close does not mean that the server promises to keep the Connection open forever.

How do HttpClient generate persistent connections

HttpClien uses connection pooling to manage holding connections, which can be reused over the same TCP link. HttpClient persists connections through connection pooling.

In fact, “pool” technology is a general design, its design idea is not complicated:

Establish a connection when it is used for the first time
At the end, the corresponding connection is not closed and returned to the pool
The next connection of the same destination can obtain an available connection from the pool
Periodically clean up expired connections

All connection pooling is the same idea, but we look at the HttpClient source code to focus on two points:

This section describes the connection pool design scheme for future reference
How does this correspond to the HTTP protocol, that is, the implementation of theoretical abstractions into code

4.1 Implementation of HttpClient connection pool

HttpClient’s handling of persistent connections can be summarized in the following code, which extracts the connection pool-related parts from MainClientExec and removes the rest:

public class MainClientExec implements ClientExecChain { @Override public CloseableHttpResponse execute( final HttpRoute  route, final HttpRequestWrapper request, final HttpClientContext context, final HttpExecutionAware execAware) throws IOException, HttpException {/ / obtain a connection from the connection manager HttpClientConnectionManager request ConnectionRequest final ConnectionRequest connRequest = connManager.requestConnection(route, userToken); final HttpClientConnection managedConn; final int timeout = config.getConnectionRequestTimeout(); HttpClientConnection managedConn = connRequest.get(timeout > 0? timeout : 0, TimeUnit.MILLISECONDS); / / the connection manager HttpClientConnectionManager and HttpClientConnection managed connection to a ConnectionHolder hold final ConnectionHolder connHolder =  new ConnectionHolder(this.log, this.connManager, managedConn); try { HttpResponse response; if (! Managedconn.isopen ()) {// If the currently managed connection is not in open state, EstablishRoute (proxyAuthState, managedConn, Route, Request, Context); } response = RequestExecutor.execute (Request, managedConn, context); If (reusestrategy. keepAlive(response, Context)) {/ / get validity connect final long duration = keepAliveStrategy. GetKeepAliveDuration (the response, the context); // set the connection validity period connholder.setvalidfor (duration, timeunit.milliseconds); // Mark the current connection to the reusable state connholder.markReusable (); } else { connHolder.markNonReusable(); } } final HttpEntity entity = response.getEntity(); if (entity == null || ! Entity. IsStreaming ()) {/ / to release the current connection to the pool, for the next call connHolder. ReleaseConnection (); return new HttpResponseProxy(response, null); } else { return new HttpResponseProxy(response, connHolder); }}Copy the code

Here we see that the connection processing in Http requests is consistent with the protocol specification, and here we will expand on the implementation.

PoolingHttpClientConnectionManager HttpClient is the default connection manager, first by requestConnection () to obtain a connection request, note that there is not connection.

public ConnectionRequest requestConnection(
            final HttpRoute route,
            final Object state) {final Future<CPoolEntry> future = this.pool.lease(route, state, null);
        return new ConnectionRequest() {
            @Override
            public boolean cancel() {
                return future.cancel(true);
            }
            @Override
            public HttpClientConnection get(
                    final long timeout,
                    final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException {
                final HttpClientConnection conn = leaseConnection(future, timeout, tunit);
                if (conn.isOpen()) {
                    final HttpHost host;
                    if (route.getProxyHost() != null) {
                        host = route.getProxyHost();
                    } else {
                        host = route.getTargetHost();
                    }
                    final SocketConfig socketConfig = resolveSocketConfig(host);
                    conn.setSocketTimeout(socketConfig.getSoTimeout());
                }
                return conn;
            }
        };
    }
Copy the code

You can see that the returned ConnectionRequest object is actually a real connection instance that holds a Future and that CPoolEntry is managed by the connection pool.

From the above code we should focus on:

Futurefuture = this.pool.lease(route, state, null

)
- How do I get an asynchronous connection from the connection pool CPool, Future
HttpClientConnection conn = leaseConnection(future, timeout, tunit)
- How do I get a real connection HttpClientConnection from an asynchronous connection Future

4.2 the Future

To see how CPool releases a Future, AbstractConnPool core code looks like this:

private E getPoolEntryBlocking( final T route, final Object state, final long timeout, final TimeUnit tunit, Final Future<E> Future) throws IOException, InterruptedException, TimeoutException { The current lock is a reentrant lock ReentrantLockthis. Lock. The lock (); Try {// Get the current connection pool of HttpRoute. For HttpClient, the total connection pool is the same size as the connection pool of each route. Final RouteSpecificPool<T, C, E> pool = getPool(route); E entry; for (;;) { Asserts.check(! this.isShutDown, "Connection pool shut down"); // get connection for (;;) {// Get a connection from the pool corresponding to route, either null or valid connection entry = pool.getFree(state); If (entry == null) {break; If (entry.isexpired (system.currentTimemillis ())) {entry.close();} // If (entry.isexpired (system.currentTimemillis ())) {entry.close(); } if (entry.isClosed()) { this.available.remove(entry); pool.free(entry, false); } else {// If you get a valid connection break; }} // Exit if (entry! = null) { this.available.remove(entry); this.leased.add(entry); onReuse(entry); return entry; } final int maxPerRoute = getMax(route); // The maximum number of connections per route is configurable. Final int excess = math.max (0, pool.getallocatedCount () + 1-maxperroute); if (excess > 0) { for (int i = 0; i < excess; i++) { final E lastUsed = pool.getLastUsed(); if (lastUsed == null) { break; } lastUsed.close(); this.available.remove(lastUsed); pool.remove(lastUsed); If (Pool.getallocatedCount () < maxPerRoute) {final int totalUsed = this.unitsize (); final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0); If (freeCapacity > 0) {final int totalAvailable = this.available.size(); if (freeCapacity > 0) {final int totalAvailable = this.available. If (totalAvailable > freecapacity-1) {if (! this.available.isEmpty()) { final E lastUsed = this.available.removeLast(); lastUsed.close(); final RouteSpecificPool<T, C, E> otherpool = getPool(lastUsed.getRoute()); otherpool.remove(lastUsed); Final C conn = this.connfactory.create (route); // Add this connection to the "pool" corresponding to route entry = pool.add(conn); // Add this connection to the "large pool". return entry; }} Boolean success = false; try { if (future.isCancelled()) { throw new InterruptedException("Operation interrupted"); } // Add the future to the route pool and wait for pool.queue(future); // Put the future into the big connection pool and wait for this.pending.add(future); // Wait for the semaphore notification,success is true if (deadline! = null) { success = this.condition.awaitUntil(deadline); } else { this.condition.await(); success = true; } if (future.isCancelled()) { throw new InterruptedException("Operation interrupted"); }} finally {// Remove pool.unqueue(future) from the queue; this.pending.remove(future); } // If the semaphore has not been notified and the current time has timed out, exit the loop if (! success && (deadline ! = null && deadline.getTime() <= System.currentTimeMillis())) { break; // Throw new TimeoutException("Timeout waiting for connection"); // Throw new TimeoutException("Timeout waiting for connection"); } finally {// Release the lock on the large connection pool this.lock.unlock(); }}Copy the code

The above code logic has several important points:

A connection pool has a maximum number of connections, and each route has a small connection pool and a maximum number of connections
When the number of large or small connection pools exceeds, some connections are released through the LRU
If a connection is available, it is returned to the upper layer for use
If no connection is available, HttpClient determines whether the current route connection pool has exceeded the maximum number of connections, creates a new connection, and adds it to the pool
If it reaches the upper limit, it will queue up and wait. When it reaches the semaphore, it will get the semaphore again. If it cannot wait, it will throw the timeout exception
ReetrantLock is used to lock connections from the thread pool to ensure thread safety

At this point, the program has either gotten a usable instance of CPoolEntry, or thrown an exception to terminate the program.

4.3 HttpClientConnection

protected HttpClientConnection leaseConnection( final Future<CPoolEntry> future, final long timeout, final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException { final CPoolEntry entry; Try {// Get CPoolEntry entry = future.get(timeout, tunit) from asynchronous operation Future<CPoolEntry>; if (entry == null || future.isCancelled()) { throw new InterruptedException(); } Asserts.check(entry.getConnection() ! = null, "Pool entry with no connection"); if (this.log.isDebugEnabled()) { this.log.debug("Connection leased: " + format(entry) + formatStats(entry.getRoute())); } return cpoolProxy.newProxy (entry);} return cpoolProxy.newProxy (entry); } catch (final TimeoutException ex) { throw new ConnectionPoolTimeoutException("Timeout waiting for connection from pool"); }}Copy the code

How do HttpClient reuse persistent connections?

In the previous chapter, we saw that HttpClient obtains connections from a connection pool and from the pool when they are needed.

Corresponding to the questions in Chapter 3:

Establish a connection when it is used for the first time
At the end, the corresponding connection is not closed and returned to the pool
The next connection of the same destination can obtain an available connection from the pool
Periodically clean up expired connections

We saw in Chapter 4 how HttpClient handles problems 1 and 3, so what about problem 2?

How does HttpClient determine whether a connection should be closed after it is used, or pooled for reuse? Take a look at the MainClientExec code again

Response = RequestExecutor.execute (Request, managedConn, context); If (reusestrategy.keepalive (response, context)) {if (reusestrategy.keepalive (context)) {if (reusestrategy.keepalive (response, context)) { Will be subject to the response of the timeout final long duration = keepAliveStrategy. GetKeepAliveDuration (the response, the context); if (this.log.isDebugEnabled()) { final String s; If (duration > 0) {s = "for "+ duration +" "+ timeunit.milliseconds; } else { s = "indefinitely"; } this.log.debug("Connection can be kept alive " + s); Connholder.setvalidfor (duration, timeunit.milliseconds); // Mark the connection as reusable connholder.markReusable (); } else {/ / the connection is marked as not reuse connHolder. MarkNonReusable (); }Copy the code

Happened as you can see, when using the connection request, have connection retry strategy to determine whether the connection to reuse, if you want to reuse will in the end to HttpClientConnectionManager into the pool.

So what is the logic of a connection reuse strategy?

public class DefaultClientConnectionReuseStrategy extends DefaultConnectionReuseStrategy { public static final DefaultClientConnectionReuseStrategy INSTANCE = new DefaultClientConnectionReuseStrategy(); @Override public boolean keepAlive(final HttpResponse response, Final HttpContext context) {// HttpRequest request = (HttpRequest) context.getAttribute(HttpCoreContext.HTTP_REQUEST); if (request ! Final Header[] connHeaders = request.getheanders (HttpHeaders. Connection); if (connHeaders.length ! = 0) { final TokenIterator ti = new BasicTokenIterator(new BasicHeaderIterator(connHeaders, null)); while (ti.hasNext()) { final String token = ti.nextToken(); // If the Connection:Close header is included, it means that the request does not intend to remain connected. If (http.conn_close. EqualsIgnoreCase (token)) {return false; }}}} // Use the superclass's reuse policy return super.keepAlive(response, context); }}Copy the code

Take a look at the parent class reuse strategy

if (canResponseHaveBody(request, response)) { final Header[] clhs = response.getHeaders(HTTP.CONTENT_LEN); // If the content-Length of the reponse is not set correctly, the connection will not be reused // Because for persistent connections, there is no need to re-establish the connection between two transfers. Therefore, the Content should be identified according to the content-Length to correctly handle "sticky packet". If (clhs. Length == 1) {final Header CLH = CLHS [0]; try { final int contentLen = Integer.parseInt(clh.getValue()); if (contentLen < 0) { return false; } } catch (final NumberFormatException ex) { return false; } } else { return false; } } if (headerIterator.hasNext()) { try { final TokenIterator ti = new BasicTokenIterator(headerIterator); boolean keepalive = false; while (ti.hasNext()) { final String token = ti.nextToken(); If (http.conn_close. EqualsIgnoreCase (token)) {return false; if (http.conn_close. // If response has a Connection: keep-alive header, it is explicitly intended to persist. Else if (http.conn_keep_alive. equalsIgnoreCase(token)) {keepalive = true; } } if (keepalive) { return true; } } catch (final ParseException px) { return false; }} // If the Connection header is not specified in response, the Connection will be used for all versions higher than HTTP/1.0. ver.lessEquals(HttpVersion.HTTP_1_0);Copy the code

To sum up:

If the request header contains Connection:Close, the request is not multiplexed
If the content-Length of the response is not set correctly, it is not multiplexed
If the response header contains Connection:Close, it is not multiplexed
If reponse header contains Connection: keep-alive, reuse
If the HTTP version is higher than 1.0, reuse it

As you can see from the code, the implementation strategy is consistent with the constraints of the protocol layer in Chapters 2 and 3.

How does HttpClient clean up stale connections

Prior to HttpClient4.4, connections for reuse from the connection pool were checked for expiration and cleaned up when they expired.

In later versions, however, a separate thread scans the pool for connections and cleans up when it finds that the time since the last use has exceeded the set time. The default timeout is 2 seconds.

Public CloseableHttpClient build() {CloseableHttpClient build() {CloseableHttpClient build() {CloseableHttpClient build() { The default is not to start the if (evictExpiredConnections | | evictIdleConnections) {/ / create a connection pool cleaning thread final IdleConnectionEvictor connectionEvictor = new IdleConnectionEvictor(cm, maxIdleTime > 0 ? maxIdleTime : 10, maxIdleTimeUnit ! = null ? maxIdleTimeUnit : TimeUnit.SECONDS, maxIdleTime, maxIdleTimeUnit); closeablesCopy.add(new Closeable() { @Override public void close() throws IOException { connectionEvictor.shutdown(); try { connectionEvictor.awaitTermination(1L, TimeUnit.SECONDS); } catch (final InterruptedException interrupted) { Thread.currentThread().interrupt(); }}}); Connectionevictor.start (); }Copy the code

You can see that during the Build of HttpClientBuilder, if cleanup is enabled, a connection pool cleanup thread is created and run.

public IdleConnectionEvictor( final HttpClientConnectionManager connectionManager, final ThreadFactory threadFactory, final long sleepTime, final TimeUnit sleepTimeUnit, final long maxIdleTime, final TimeUnit maxIdleTimeUnit) { this.connectionManager = Args.notNull(connectionManager, "Connection manager"); this.threadFactory = threadFactory ! = null ? threadFactory : new DefaultThreadFactory(); this.sleepTimeMs = sleepTimeUnit ! = null ? sleepTimeUnit.toMillis(sleepTime) : sleepTime; this.maxIdleTimeMs = maxIdleTimeUnit ! = null ? maxIdleTimeUnit.toMillis(maxIdleTime) : maxIdleTime; This. Thread = this. ThreadFactory. NewThread (new Runnable () {@ Override public void the run () {try {/ / death cycle, the thread has been executed while (! Thread.currentthread ().isinterrupted ()) {// Execute after resting for several seconds, default 10 seconds thread.sleep (sleepTimeMs); / / clean up overdue connection connectionManager. CloseExpiredConnections (); // If the maximum idle time is specified, Clean up the idle connections if (maxIdleTimeMs > 0) {connectionManager. CloseIdleConnections (maxIdleTimeMs, TimeUnit. MILLISECONDS); } } } catch (final Exception ex) { exception = ex; }}}); }Copy the code

To sum up:

Clearing expired and idle connections is enabled only if the HttpClientBuilder is manually set
Manually, will start a thread loop execution, each sleep time, call HttpClientConnectionManager method of cleaning a expired and free connection.

Vii. Summary of this paper

The HTTP protocol alleviates the problem of excessive connections in earlier designs by means of persistent connections
Persistent connections are available in two ways: HTTP/1.0+ keep-avLive and HTTP/1.1’s default persistent connections
HttpClient manages persistent connections through connection pools, which are divided into two pools: the total connection pool and the connection pool for each route
HttpClient gets a pooled connection through an asynchronous Future
The default Connection reuse policy is consistent with HTTP protocol constraints. According to response, Connection:Close is disabled, Connection: keep-alive is enabled, and the last version is greater than 1.0
Connections in the connection pool will be cleaned only if the clearing of expired and idle connections is manually enabled in HttpClientBuilder
Versions of HttpClient4.4 and later clean up stale and idle connections with a dead-loop thread that sleeps for a while each time it executes to achieve periodic execution

The above research is based on the HttpClient source personal understanding, if wrong, I hope you leave a positive message discussion.