HTTPS concept
HTTPS is a secure HTTP communication. As we all know, HTTP is transmitted in plaintext, and HTTP packets are ASCII codes that can be identified by human eyes. In the communication process, HTTP packets are easy to be eavesdropped, tampered with, and forged by hackers. So you need a protocol like HTTPS with a security layer.
So how does HTTPS ensure secure communication? What kind of communication can be called secure, and what is the definition of security?
Generally, a communication process is considered “secure” if it has four characteristics: confidentiality, integrity, authentication, and non-repudiation. HTTPS implements these four features and is therefore considered “secure.”
The difference between HTTPS and HTTP:
-
HTTP is a hypertext transmission protocol. Information is transmitted in plain text. HTTPS is a secure encrypted transmission protocol
-
HTTP is based on TCP. After three TCP handshakes, HTTP communication starts. HTTPS is an SSL/TLS layer added between HTTP and TCP. After the TCP handshake, the TLS handshake is required to start communication.
-
HTTP has no identity authentication and has security risks. HTTPS uses the certificate system for identity authentication. Websites that use HTTPS need to apply for a certificate from a CA.
-
The default HTTP port is 80, and the default HTTPS port is 443
Basic concepts of HTTPS:
SSL: Secure sockets layer, layer 5 (session layer) in the OSI model; TLS: In 1999, SSL was officially standardized as TLS (Transport Layer Security), so TLS1.0 is actually SSLv3.1. The most widely used TLS is currently 1.2.
TLS uses many cutting-edge cryptography technologies such as symmetric encryption, asymmetric encryption and identity authentication.
When establishing a connection using TLS, browsers and servers need to select an appropriate set of encryption algorithms to secure communication. The combination of these algorithms is called a “cipher suite.”
TLS cipher suite name: key exchange algorithm + signature algorithm + symmetric encryption algorithm + digest algorithm.
Such as: ECDHE – RSA – AES256 – GCM – SHA38
Abstract: ECDHE algorithm is used for key exchange during handshake, RSA signature and identity authentication are used, AES symmetric algorithm is used for communication after handshake, key length is 256 bits, the grouping mode is GCM, SHA384 algorithm is used for message authentication and random number generation
Confidentiality implementation mechanism
Front knowledge
Why confidentiality?
Because HTTP is transmitted in plaintext, the plaintext means that header fields and other information are directly transmitted in ASCII code, which can be easily hijacked and tampered with. When we use HTTP for monetary transactions, there is no security at all.
The most common method of achieving confidentiality is “encryption”, in which messages are somehow converted into unintelligible gibberish that only those with special “keys” can convert back into the original text.
The “key” is called “key”, the message before encryption is called “plaintext”, and the garbled code after encryption is called “ciphertext”. The process of restoring plaintext with the key is called “decryption”, which is the reverse operation of encryption, and the operation process of encryption and decryption is called “encryption algorithm”.
Encryption can be divided into two categories based on how the key is used: symmetric encryption and asymmetric encryption.
Symmetric encryption
Symmetric encryption: The encryption and decryption keys are symmetric. As long as the security of the key is ensured, the whole communication process can be said to be confidential.
For example, if you want to log in to a certain website, as long as you agree with it in advance to use a symmetric password, the communication process is transmitted with a key encrypted ciphertext, only you and the website can decrypt. Even if a hacker is able to eavesdrop, all he sees is gibberish. Without a key, the plaintext cannot be solved, thus achieving confidentiality.
Currently, the most common encryption algorithm is AES (Advanced Encryption Standard), and the key length can be 128, 192, or 256.
Symmetric algorithms also have the concept of “grouping mode”, which allows the algorithm to encrypt plaintext of any length with a fixed-length key, turning small secrets (i.e. keys) into large secrets (i.e. ciphertext), commonly known as GCM, CCM, and Poly1305.
Put the above together and you get the symmetric encryption algorithm defined in the TLS cipher suite. For example, AES128-GCM, which stands for AES algorithm with 128-bit key length, uses the GCM grouping mode. Chacha20-poly1305 means ChaCha20 algorithm and uses the grouping pattern Poly1305.
Asymmetric encryption
Symmetric encryption has a disadvantage, is not able to solve the “key exchange” problem, because symmetric encryption algorithms can be decrypted only with the key. If the key you agreed to with a website is stolen in transit by a hacker, he can then decrypt the data he sends and receives at will, and the communication process is no longer confidential.
As a result, asymmetric encryption, also known as public key encryption, emerged. It has two keys, a public key and a private key. The two keys are different, “asymmetric,” and the public key can be made public for anyone to use, while the private key must be kept strictly secret.
The public key and private key have a special “one-way”, although both can be used to encrypt and decrypt, but the public key encryption can only be decrypted with the private key, and vice versa, the private key encryption can only be decrypted with the public key. Asymmetric encryption can solve the “key exchange” problem. The website keeps the private key in secret and distributes the public key arbitrarily on the Internet. You want to log in to the website as long as it is encrypted with the public key, and the ciphertext can only be decrypted by the private key holder. Hackers can’t crack the ciphertext because they don’t have a private key.
RSA is the most famous asymmetric encryption algorithm. Its security is based on the mathematical problem of “integer decomposition”. The product of two super-large prime numbers is used as the material to generate the key.
ECC is a “rising star” in asymmetric encryption. It is based on the mathematical problem of “elliptic curve discrete logarithm” and uses specific curve equations and basis points to generate public and private keys. The sub-algorithm ECDHE is used for key exchange and ECDSA is used for digital signature.
HTTPS encryption mode
Mixed encryption
Asymmetric encryption does not have the “key exchange” problem, but because they are based on complex mathematical puzzles, the computation speed is slow, which is several orders of magnitude worse than symmetric encryption algorithms. If asymmetric encryption is used alone, the utility becomes zero, although security is guaranteed but communication is as fast as turtle and snail.
Therefore, TLS uses “hybrid encryption” :
- At the beginning of communication, asymmetric encryption algorithms such as RSA and ECDHE are used to solve the key exchange problem. The use of random Numbers to produce symmetrical encryption algorithm USES the session key (the session key), then use the public key encryption. Because the session key is very short, usually only 16 bytes or 32 bytes, so slowly it doesn’t matter. With decrypted after each other to get the ciphertext, take out the session key. In this way, the two sides have achieved the safety of the symmetric key exchange.”
- All subsequent communications use symmetric encryption.
Integrity implementation mechanism
Integrity is used to ensure that communication data has not been tampered with.
Why completeness? Although hackers can’t get the session key or crack the ciphertext, they can collect enough ciphertext through eavesdropping and try to modify and reorganize it and send it to the website. If there is no guarantee of integrity, the server will simply “take it all”, and he can then read the server’s response for further clues, which will eventually decipher the plaintext.
Abstract
The main means to achieve integrity is the summary algorithm, which is often said hash function, hash function.
The digest algorithm can be roughly thought of as a special compression algorithm, which can “compress” arbitrary length of data into a fixed length, unique “digest” string, as if to generate a digital “fingerprint” for the data. Abstract algorithm can also be understood as a special “one-way” encryption algorithm, it has only algorithm, no key, the encrypted data can not be decrypted, can not be derived from the original text.
The most common digest algorithms are MD5 and SHA-1, which can generate 16-byte and 20-byte digests. However, these two algorithms have low security strength and are not secure enough. They have been banned from use in TLS.
Currently, TLS is recommended to use sha-2, the successor of SHA-1. Sha-2 is actually a series of digest algorithms, including SHA224, SHA256, and SHA384, which can generate 28, 32, and 48 bytes of digest respectively
The algorithm ensures that the digital abstract is equivalent to the original text. Therefore, as long as we attach an abstract to the original text, we can ensure the integrity of the data.
For example, you send a message that says, “Transfer 1000 yuan,” followed by a summary of SHA-2. The site also calculates a summary of the message it receives and compares the two “fingerprints”. If they match, the message is complete and reliable and has not been modified. If the hacker had changed even one punctuation mark in the middle, the summary would have been completely different, and a comparison of the website’s calculations would have revealed that the message had been doctored and was not credible.
Identification and non-repudiation
A digital signature
A digital signature is an abstract encrypted with a private key.
Why encrypt digest with private key?
- Because a hacker can modify the content of the data and remove the digest at the same time, the digest is useless, so encrypt the digest to prevent both the content and the digest from being replaced
- Advantages of using private key encryption and public key decryption: cannot be forged. Because the private key is confidential, the middleman does not have the private key. Even if he steals the digital signature and modifies the signature, the public key will not be able to open the signature, and the client will find that the message has been tampered with.
The digital certificate
Another problem is that a hacker impersonates a website and gives the client a fake public key. If you get the fake public key, the hybrid encryption completely fails. Therefore, to solve the trust problem of public keys, digital certificates need to be introduced.
A digital certificate is information about a server and its public key issued by a trusted authority (CA). The server provides a certificate to the client to prove that it is indeed a website. The CA public key is public. After obtaining the certificate, the client can unlock the CA public key to obtain the server public key.
Both the operating system and browser have built-in root certificates of major CA. When accessing the Internet, as long as the server sends its certificate, you can verify the signature in the certificate. You can verify the certificate layer by layer along the certificate chain until you find the root certificate
In the course of communication,
The server must provide:
- Data content (session key encryption)
- Digital signature (encrypted digest of server private key)
- Digital certificate (server public key encrypted with CA private key)
After the client receives:
- Decrypt the certificate with the CA public key, and take out the server public key
- Decrypt the digital signature using the server’s public key and get the digest
- The summary algorithm is used to calculate the content, and the calculated summary is compared with the one just decrypted. If they are exactly the same, it indicates that the data is complete and has not been tampered
Authentication: Uses a digital certificate issued by the CA to prove that the public key is indeed the public key of the server.
Undeniable: If you can unlock the digital signature with the public key of the server, it means that the message can only be sent by the server, not someone else, because only the paired public key can unlock the file encrypted by the unique private key.
TLS handshake process
In HTTP, after the TCP three-way handshake succeeds, the browser immediately sends a request packet. HTTPS, however, requires another handshake (the TLS handshake) to establish a secure connection over TCP before sending or receiving a message. The primary purpose of the TLS handshake is to use asymmetric encryption to exchange a symmetric key, which is generated from three random numbers.
There are 4 rounds of TLS handshake:
1. The client sends a request
- Supported protocol versions, such as TLS 1.0
- A client-generated random number that is later used to generate the “session key”
- Supported cipher suites (supported encryption methods)
2. The server responds
- Verify the version of the encrypted communication protocol used, such as TLS 1.0. If the browser does not match the version supported by the server, the server turns off encrypted communication.
- A server generated random number that is later used to generate the “session key”
- Confirm the encryption method used, such as RSA public key encryption
- Server certificate
3. The client responds
After receiving the response from the server, the client starts to verify the authenticity of the certificate step by step through the certificate chain. If the certificate is not issued by a trusted authority, or the domain name in the certificate is inconsistent with the actual domain name, or the certificate has expired, a warning will be displayed to the visitor, who can choose whether to continue communication:
If the certificate is valid, take the server public key out of the certificate and send the following information to the server:
- One encrypts a pre-master key using the server’s public key to prevent eavesdropping
- Encoding Change notification, indicating that subsequent messages will be sent using an agreed encryption method and key (Change Cipher Spec)
- Client handshake Completion Notification (Finished) : Indicates that the client handshake is complete. This item hashes all previously sent data and then encrypts it for the server to verify
The first random number is the third random number in the handshake phase, also known as the “pre-master key”. With this, the client and server have three random numbers at the same time, and each generates the same “session key” for the session using a pre-agreed encryption method.
4. Final response from the server
After receiving the third random pre-master key from the client, the server calculates and generates the session key used for the session.
Then, the following information is finally sent to the client:
- Encoding Change notification, indicating that subsequent messages will be sent using an agreed encryption method and key (Change Cipher Spec)
- Finished Handshake notification (Finished) : Indicates that the handshake of the server is complete. This item hashes all the previously sent data and then encrypts it for verification by the client
After the TLS handshake is complete, the two parties use the symmetric session key for encrypted communication.
Pros and cons of HTTPS
advantages
-
HTTPS authenticates users and servers to ensure that data is sent to the right clients and servers.
-
HTTPS is a network protocol that uses SSL, TLS, and HTTP to encrypt transmission and authenticate identity. It is more secure than HTTP and protects data from theft and alteration during transmission, ensuring data integrity.
-
HTTPS is the most secure solution under the current architecture, and while it is not absolutely secure, it significantly increases the cost of man-in-the-middle attacks.
-
Google tweaked its search engine in August 2014, saying that “HTTPS encrypted sites will rank higher in search results than comparable HTTP sites.”
disadvantages
-
HTTPS handshake is time-consuming, which lengthens the page loading time by nearly 50% and increases power consumption by 10% to 20%.
-
HTTPS connection caching is not as efficient as HTTP, which increases data overhead and power consumption
-
SSL certificates cost money, the more powerful the certificate cost higher, personal websites, small websites do not need to generally do not use.
-
The HTTPS protocol also has a limited range of encryption and has little effect on hacker attacks, denial of service attacks and server hijacking. Most importantly, the SSL certificate credit chain system is not secure, especially in cases where some countries can control the CA root certificate, man-in-the-middle attacks are just as feasible.
reference
Geek Time: Perspective TTP protocol
Overview of the SSL/TLS protocol operation mechanism