Briefly record the recent learning HTTPS, TLS and other protocols and experience.
1. The HTTPS definition
HyperText Transfer Protocol Secure (HTTPS); Often called HTTP over TLS, HTTP over SSL, or HTTP Secure, is a transport protocol for Secure communication over a computer network. HTTPS communicates over HTTP, but uses SSL/TLS to encrypt packets. HTTPS is developed to provide identity authentication for web servers and protect the privacy and integrity of data exchanged.
2. The difference between HTTPS and HTTP
-
The HTTP URL starts with http:// and uses port 80 by default, while the HTTPS URL starts with https:// and uses port 443 by default.
-
HTTP is insecure because of plaintext transmission. Attackers can obtain website accounts and sensitive information by means of listening and man-in-the-middle attacks. HTTPS is designed to prevent these attacks and is secure when configured correctly.
3. Encryption algorithm
Before INTRODUCING HTTPS, let’s talk about encryption algorithms. Encryption algorithms can be divided into bidirectional and unidirectional encryption algorithms. Bidirectional encryption algorithm indicates that text and ciphertext can be converted into each other, while unidirectional encryption algorithm can only be converted from plain text to ciphertext, not reverse. The following figure shows the main algorithms used for the two categories.
This paper focuses on symmetric encryption and asymmetric encryption, as well as the combination of the two.
3.1 Symmetric Encryption
It can be summarized by a set of formulas:
f1(k, data) = encrypted_data
f2(k, encrypted_data) = data
Copy the code
F1 and F2 are encryption and decryption algorithms respectively. K is the key, data is the plain data, and encrypted_data is the encrypted data.
Process: The client and server negotiate a key and a symmetric encryption algorithm. The client uses the key and the symmetric encryption algorithm to encrypt the data during data transmission. The other end uses the decryption algorithm to obtain the original plaintext data after receiving the key.
3.2 Asymmetric encryption
Asymmetric encryption includes public key (public_key) and private key (private_key), which correspond to each other one by one. A public key is generated at the same time as the private key.
The formula is summarized as follows:
f(pub_key, data1) = encrypted_data1
f(pri_key, encrypted_data1) = data1
-------------------------------------
f(pri_key, data2) = encrypted_data2
f(pub_key, encrypted_data2) = data2
Copy the code
Pub_key and pri_key are public and private keys respectively. Data can be encrypted using both public and private keys, and data can be decrypted using the corresponding private or public key.
Process: The client sends a request to the server for the public key, and the server returns the public key. The client encrypts the data through the public key and sends it to the server. The server decrypts the data through the private key. The server encrypts data using the private key and sends the data to the client. The client decrypts the data using the public key.
3.3 Symmetric Encryption + Asymmetric Encryption
The efficiency of asymmetric encryption and the security of symmetric encryption are the disadvantages of both, so the two can be combined to achieve complementarity.
Process: The client sends a request to the server for the public key, and the server returns the public key. The client generates a random number num. Num is encrypted using the public key and sent to the server. The server decrypts num using the private key to obtain num. In this case, both the client and the server regard NUM as the symmetric encryption key through which the data to be transmitted is encrypted and decrypted.
4. Digital certificate and signature
When a packet is received, there is a problem of determining whether the packet was actually sent by the destination server (a man-in-the-middle attack impersonates the server and sends the packet, public key, to the client). In this case, “digital signature” and “digital certificate” technologies are required to verify the consistency of file contents and the identity of the server (and, in more secure scenarios, the identity of the client). Digital signature algorithms include RSA (hash algorithm + RSA encryption algorithm), DSA, and ECDSA.
-
Xiaoming first generates public and private keys, hashes the file contents to obtain the file abstract (file hash value), then encrypts the abstract with private key and RSA encryption algorithm to obtain digital signature.
-
Xiao Ming sends the file, public key and digital signature to Xiao Hong, who receives the network package and decrypts the digital signature using the public key to obtain the file abstract 1 (file hash value). The file is hashed to obtain the file abstract 2. If abstract 1 and abstract 2 are the same, the signature is valid, and vice versa. Comparing digest 1 with Digest 2 ensures that the file is not tampered with during transmission (if the file is tampered with, the hash of the file results in digest 2 different from the decrypted signature of digest 1).
-
This leads to a question: how can Xiao Hong ensure that the public key obtained must be generated by Xiao Ming?
Ming sends his public key and part of his identity information to the Certificate Authority (CA). After verifying Ming’s identity, the CA issues a digital Certificate (ensuring the one-to-one correspondence between the public key and Ming). Ming only needs to send the digital Certificate together to Hong, and Hong can determine whether the received public key is generated by Ming.
-
How do I ensure that my digital certificate is not tampered with?
The CA has its own public key and private key. The CA uses the private key to encrypt Ming’s file and public key with hash digest + RSA to obtain the CA’s digital signature, which is stored in Ming’s digital certificate. The CA root certificate is embedded in the operating system or browser of the client and server to prevent forgery.
In the actual environment, the certificate sent by the server usually consists of multiple certificates. For example, root CA certificate > Intermediate CA certificate > Server certificate. The client has a procedure for verifying the validity of the server certificate. You will first find the certificate for that certificate, which is an intermediate CA certificate. Then look for the intermediate CA certificate, which can be another intermediate CA certificate, or the root CA certificate, all the way to the root CA certificate. The digital signature is then verified from the root CA certificate down. The public key of the CA certificate is used to verify the digital signature of the intermediate certificate, and the public key of the intermediate certificate is used to verify the digital signature of the server certificate. If any verification fails, the certificate is considered invalid.
5. The TLS handshake
After the TCP three-way handshake is completed and the connection is successful, the client initiates a TLS connection. The TLS handshake phase is entered.
Purpose of TLS handshake:
-
Negotiated encryption key
Encrypts the content of application protocols such as HTTP. This key is also called the master key, which is the key of the encryption algorithm.
-
Negotiated encryption algorithm
To provide efficiency, symmetric keys are used. Symmetric encryption uses bit operations and is fast and even hardware-accelerated. Asymmetric encryption, such as RSA, which uses large number multiplications, is generally slower. Symmetric encryption is also very secure as long as the key is not compromised. This is what the SSL handshake protocol ensures later.
-
Verify identity and data integrity
Normally, you just need to authenticate the server. In special cases, for example, in high-security application scenarios, client identity authentication is required. The server will return a certificate chain with a CA certificate in it. Through the chain guarantee of the certificate, you can confirm whether the server is trusted. In addition, during the handshake, after the public key is successfully transferred, certain information is digitally signed to ensure that the data is not tampered with and the identity is correct.
General TLS handshake process (TLS handshake process is not invariable, based on actual application scenarios, there are three main types) :
-
Only the server side is validated
The handshake is completed in three stages, as is the Wireshark request. The average network request only goes so far.
-
Verify the server and client
In scenarios with high security requirements, the server authenticates the identity of the client. The way is also to issue certificates to prove themselves.
-
Restore the original session
This falls under the category of HTTPS optimization. Use the Session Ticket or Session ID mechanism to restore the handshake Session. This can be done over different TCP. Since the encrypted data for the handshake has been saved, direct recovery can begin delivery. Session Ticket The client saves the encryption information, and the server saves the encryption information in Session ID mode. Session Ticket is not widely supported on the Android client, however, depending on the model and the version of the built-in OpenSSL.
TLS handshake mechanism illustration
Client Hello
The packet visiting Zhihu web page was captured in Wireshark, and the first TLS packet was obtained by tracing TCP flow
-
type
-
The length of the
-
version
TLS 1.2 is also SSLv3.2. This is the highest version of SSL that SSL clients can support
-
The random number
Generate a random 32 byte number (random number 1 in figure). Finally, the master key for encrypting data needs to be negotiated between the client and the server. The Server Hello phase of the back-end Server also generates a random number. Used to calculate the master key.
-
The session ID
This Session ID is reusable, depending on server resources and support. To reuse the Session ID, the SSL server needs to maintain the state of the connection and the encrypted information left over from the last successful handshake. If this is the first time you visit this url, the session ID is 0 and has not been created. If the Session ID is saved by the client, the Session ID will be carried when the client initiates an SSL request next time.
-
Encryption suite
List of password suites that the client can support. The kits are sorted by priority. Each suite represents a key specification. It starts WITH “TLS,” followed by the key exchange algorithm, and then connects the encryption algorithm to the authentication algorithm WITH “WITH.” An encryption suite consists of the key exchange algorithm, the encryption algorithm (with the maximum number of supported key digits), the authentication algorithm, and the encryption method. The server ultimately decides which cipher suite to use. What cipher suite is requested is reported back in Server Hello.
-
Compression algorithm
The value is 0, indicating that compression algorithms are not supported
-
Extension field
Some extended information, such as SNI support, ALPN information and so on
Server Hello
-
type
-
version
Specify TLSv1.2 for SSL this time
-
The random number
The Client Hello procedure above also produces a 32-bit random number that will participate in the creation of the master key. (Corresponding to random number 2 in the diagram)
-
The session ID
-
Encryption suite
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256. This is selected from the encryption suite uploaded by Client Hello. According to the format of the cipher suite, the information above is as follows:
- The Exchange encryption algorithm is ECDHE (EC Diffie-Hellman). RSA indicates that the digital signature of packets carrying the public Key of the DH encryption algorithm in the Server Key Exchange phase is RSA.
- The encryption algorithm is AES and the maximum key supports 128 bits
- Authentication algorithm SHA256
-
Compression method
This is 0, which means no compression algorithm is used
Certificate
The server issues the certificate, and the client authenticates the server and retrieves the public key carried by the certificate, which is the public key for exchanging encryption algorithms. The ECDHE (EC Diffie-Hellman) algorithm specified in the Server Hello phase is also commonly known as DH encryption.
The Certificate message delivers the Certificate chain from the digital Certificate that carries its own public key and the CA Certificate, in the Certificates field:
According to the field information, there are three digital certificates in the certificate chain. They are server (ZHIhu.com) certificate -> intermediate CA certificate -> root CA certificate. First look at the server certificate
The first is the content of the signedCertificate field, which is the data for the digital certificate
-
version
The corresponding standard is X.509 V3
-
The serial number
SerialNumber, the unique serialNumber of the certificate issuer.
-
Signature algorithm ID
This refers to a signature algorithm that uses SHA-256 for summarization and RSA for encryption.
-
Certificate issuer
The value of issuer is the information about the CA that issues the certificate. It contains the Distinguished Name (DN) of the CA. For example, the country is US (THE United States), the organization is DigiCert Inc., and the Name is GeoTrust CN RSA CA G1. Next, we need to find the CA certificate from the certificate chain (specific method: find the country, organization, and name in the Subject field of other certificates) to authenticate the current certificate
-
The period of validity
Validity: Indicates the start time and end time of the certificate
-
The name of the object
Subject, which is the name of the certificate and other main information. For example, the country is CN (China) and the organization is Sage Four Seas (Beijing) Technology Co., LTD. The name is *.zhihu.com, which is also the domain name applicable to the certificate.
-
Object Public Key Information
SubjectPublicKeyInfo. Since this is the server certificate, this public key will be used later in the master key exchange process, and you can see that this public key is encrypted using RSA
-
extension
Some extended information. Such as an alias for an object. If this is the CDN server certificate, then the alias will be very many.
Then the certificate authority’s signature information:
-
Signature algorithm
AlgorithmIdentifier, sha-256 digest plus RSA encryption signature algorithm is used. This is the signature algorithm used by the CA certificate that authenticates the certificate.
-
Signature information
Encrypted, the content of this message. The CA certificate abstracts the above data portion of SHA-256 and encrypts it using RSA’s private key. It will be used later in the certificate authentication process to obtain the public key of the CA certificate, decrypt the signature information, and obtain the data summary using the same algorithm to check whether the data is the same.
Server Key Exchange
The key exchange phase, which is an optional step, complements the Certificate phase and exists only in these scenarios:
- RSA encryption is used in the negotiation, but the server certificate does not provide an RSA public key.
- EC Diffie-Hellman (DH) encryption is used in negotiation, but the server certificate does not provide DH parameters.
- Fortezza_kea encryption is used for negotiation, but the server certificate does not provide a parameter.
Server Hello Done
Notifies the client that the version negotiation with the encryption suite is complete.
The Wireshark captures packets that the Server Key Exchange and Server Hello Done are added to the same SSL recording protocol because SSL recording protocols have the combination function. The client receives such a packet and processes it into two separate protocol packets, which is the grouping function of the SSL recording protocol.
Client Key Exchange
In this case, the client does not directly generate an encryption key. Instead, the client generates a random number from the random number generated by the client and the server and encrypts the number to the server using the EC Diffie-Hellman algorithm. This value is also known as “premaster secret”.
After receiving the packet, the server uses its private key to unlock the random number. After this stage, both the server and client have three random numbers: the client random number, the server random number, and the preparatory master key. After the server receives the Client Key Exchange message, both ends generate the master Key based on the corresponding algorithm, and the encryption Key Exchange is complete. The exchange is finished. Since the master key is generated by two ends according to the agreed algorithm, how to ensure that the master key is correct? That’s when the next stage comes in. The client and server will digest the handshake information using SHA, encrypt it with THE AES encryption algorithm and master key, and send it to each other for verification. This approach is also known as message authentication. This is the process:
Change Cipher Spec (Client)
The client notifies the server that subsequent packets will be encrypted.
Encrypted Handshake Message (Client)
This is the Client Finished message on the Client. It is also the first encrypted message sent to the server during the SSL process. After receiving the handshake message, the server computes the summary of the exchanged handshake message in the same way and compares it with the message decrypted with the master key. If they are consistent, the two ends generate the same master key and complete the key exchange.
Change Cipher Spec (Server)
The server notifies the client that subsequent packets are encrypted.
Encrypted Handshake Message (Server)
Here is the Server Finish message on the Server side. Encrypted Handshake Message, like the Encrypted Handshake Message of the client above, is the first Encrypted Message sent by the server. After the client decrypts and verifies the negotiated master key, the SSL handshake is complete.
Application Data
Apply data transfer messages. Since this is HTTPS, you can apply protocol data encryption to HTTP and then transfer it.
6. Reference materials
Wireshark Understands HTTPS request process
HTTPS principle full resolution
2 minutes to master the HTTPS principle and TLS handshake mechanism
Principles of digital signature and digital certificate