HTTPS
The short answer to why HTTPS is there is “because HTTP is not secure.”
What is security?
Communications are generally considered “secure” if they have four features: confidentiality, integrity, identity authentication, and non-repudiation.
- Secrecy/Confidentiality refers to the Confidentiality of data, which is only accessible to trusted persons and not visible to others. In simple terms, it means that people who are not involved should not be allowed to see things that are not relevant to the data.
- Integrity (also known as consistency) is the fact that the data is not tampered with during transmission, no more, no less, and remains intact.
- Authentication means to verify the true identity of the other party, so that messages can only be sent to trusted people.
- It’s non-repudiation. It’s also non-repudiation. It means that you can’t deny an action that has taken place.
What is HTTPS?
HTTPS adds four security features to HTTP.
HTTPS is a “very simple” protocol. The RFC document is only seven pages long. It specifies the new protocol name “HTTPS”, the default port number 443, and the rest of the request-reply mode, packet structure, request method, URI, header field, connection management, and so on, all follow HTTP. There’s nothing new.
So how does HTTPS achieve security features such as confidentiality and integrity?
The secret lies in the “S” in HTTPS’s name. It changes the underlying HTTP transport protocol from TCP/IP to SSL/TLS, from “HTTP over TCP/IP” to “HTTP over SSL/TLS”, making HTTP run on secure SSL/TLS. Instead of using the Socket API, incoming and outgoing messages invoke a specialized security interface.
SSL/TLS
SSL, or Secure Sockets Layer, is Layer 5 (session Layer) in the OSI model. It was invented by Netscape in 1994 and comes in versions V2 and V3, while V1 was never made public due to serious flaws.
When SSL was developed to V3, it proved itself to be a very good secure communication protocol, so the Internet engineering group IETF renamed it TLS (Transport Layer Security) in 1999 and officially standardized it. The version number was recalculated from 1.0. So TLS1.0 is actually SSLv3.1.
Up to now, TLS has developed three versions, namely 1.1 in 2006, 1.2 in 2008 and 1.3 in 2018. Each new version closely follows the development of cryptography and the current situation of the Internet, continuously enhancing security and performance, and has become an authoritative standard in the field of information security.
The most widely used TLS protocol is 1.2. The previous protocols (TLS1.1/1.0, SSLv3/v2) have been deemed unsafe and will be discontinued by browsers around 2020, so the rest of the presentation will focus on TLS1.2.
TLS is composed of several sub-protocols, such as recording protocol, handshake protocol, warning protocol, password change protocol and extension protocol. It uses many cutting-edge cryptography technologies, such as symmetric encryption, asymmetric encryption and identity authentication.
When the browser and server use TLS to establish connections, they need to select an appropriate set of encryption algorithms, known as a cipher suite, for secure communication. TLS password suite name is very standard, the format is very fixed. The basic form is “key exchange algorithm + signature algorithm + symmetric encryption algorithm + digest algorithm”. For example, ecdHE-RSA-AES256-GDM-SHA384 indicates that the handshake uses the ECDHE algorithm for key exchange, RSA signature and identity authentication, AES symmetric algorithm for communication after the handshake, the key length is 256 bits, and the grouping mode is GCM. The algorithm SHA384 is used for message authentication and random number generation.
OpenSSL
OpenSSL is a well-known open source cryptography library and toolkit. It supports almost all public encryption algorithms and protocols and has become a de facto standard. Many applications will use it as the underlying library to realize TLS functions, including common Web servers such as Apache and Nginx. OpenSSL is a well-known open source cryptography toolkit, which is a concrete implementation of SSL/TLS.
Symmetric and asymmetric encryption
The most common way to achieve confidentiality is encrypt, which is to convert messages in a way that no one can understand, so that only those with a special “key” can convert the original text again.
The “key” here is called a “key”, the message before encryption is called plain text/clear text, and the garbled code after encryption is called cipher text. The process of decrypting plain text with a key is called decrypt, which is the reverse operation of encryption. The operation process of encryption and decryption is called “encryption algorithm”.
All encryption algorithms are public and can be analyzed by anyone, and the “keys” used by the algorithms must be kept secret. Since HTTPS and TLS both run on computers, the “key” is a long string of numbers, but the conventional metric is “bits”, not “bytes”. For example, if the key length is 128 bytes, it is a 16-byte binary string, and if the key length is 1024 bytes, it is a 128-byte binary string.
Symmetric encryption
Symmetric encryption means that encryption and decryption use the same key. As long as the security of the key is ensured, the whole communication process can be said to be confidential.
There are many symmetric encryption algorithms to choose from in TLS, such as RC4, DES, 3DES, AES, and ChaCha20. However, the first three algorithms are considered insecure and are generally prohibited. Currently, only AES and ChaCha20 are commonly used.
AES stands for Advanced Encryption Standard, and the key length can be 128, 192, or 256. It is the replacement of DES algorithm, security strength is very high, performance is very good, and some hardware will do special optimization, so it is very popular, is the most widely used symmetric encryption algorithm.
ChaCha20 is another encryption algorithm designed by Google. The key length is fixed at 256 bits. The pure software performance is better than AES.
Encrypted packet mode
Symmetric algorithms also have the concept of “grouping mode”, which allows the algorithm to encrypt plaintext of any length with a fixed-length key, turning small secrets (keys) into large secrets (ciphertext).
The latest grouping mode is called AEAD (Authenticated Encryption), which adds authentication to Encryption. GCM, CCM and Poly1305 are commonly used.
Put the above together and you get the symmetric encryption algorithm defined in the TLS cipher suite.
For example, AES128-GCM, which stands for AES algorithm with 128-bit key length, uses the GCM grouping mode. Chacha20-poly1305 stands for ChaCha20 algorithm and uses the grouping pattern Poly1305.
Asymmetric encryption
Symmetric encryption may seem like a perfect way to achieve confidentiality, but there’s a big problem: how to safely transfer the key to each other, a term called “key exchange.”
As a result, asymmetric encryption (also known as public-key encryption algorithms) emerged. It has two keys, a public key and a private key. The two keys are different, “asymmetric,” and the public key can be made public for anyone to use, while the private key must be kept strictly secret.
Public and private keys have a special “one-way” property. Although both can be used for encryption and decryption, the encrypted public key can only be decrypted with the private key, and vice versa.
Asymmetric encryption can solve the problem of “key exchange”. The website keeps the private key in secret and distributes the public key arbitrarily on the Internet. You want to log in to the website as long as it is encrypted with the public key, and the ciphertext can only be decrypted by the private key holder.
The design of asymmetric encryption algorithm is much more difficult than symmetric algorithm. There are only a few kinds in TLS, such as DH, DSA, RSA, ECC and so on.
RSA is probably the most famous of these, and is almost synonymous with asymmetric encryption. Its security is based on the mathematical problem of integer factorization. It uses the product of two very large prime numbers as the material to generate the key, and it is very difficult to deduce the private key from the public key.
ECC (Elliptic Curve Cryptography) is a “rising star” in asymmetric encryption. It is based on the mathematical problem of “Elliptic Curve discrete logarithm” and uses specific Curve equations and basis points to generate public and private keys. The sub-algorithm ECDHE is used for key exchange and ECDSA is used for digital signature.
Compared with RSA, ECC has obvious advantages in security strength and performance. A 160-bit ECC corresponds to 1024-bit RSA, while a 224-bit ECC corresponds to 2048-bit RSA. Because the key is short, so the corresponding amount of calculation, consumption of memory and bandwidth is less, encryption and decryption performance is up, for the current mobile Internet is very attractive.
Mixed encryption
Asymmetric encryption does not have the problem of “key exchange”, but because they are based on complex mathematical problems, the operation speed is very slow, even ECC is several orders of magnitude worse than AES. If only asymmetric encryption is used, security is guaranteed, but communication speed is too slow to be practical.
Hybrid encryption is the combination of symmetric encryption and asymmetric encryption, they learn from each other, that is, efficient encryption and decryption, and can safely exchange keys:
At the beginning of communication, asymmetric algorithms such as RSA and ECDHE are used to solve the key exchange problem first.
The “session key” used by the symmetric algorithm is then generated using random numbers and encrypted with the public key. Because session keys are short, usually only 16 or 32 bytes, it doesn’t matter if they are slower.
The peer party decrypts the ciphertext with the private key and extracts the session key. In this way, the secure exchange of symmetric keys is realized, and asymmetric encryption is no longer used, but symmetric encryption is used.
Digital signature and certificate
Encryption achieves confidentiality, but still does not guarantee features such as integrity and identity authentication
The algorithm
The main method to achieve integrity is Digest Algorithm, also known as Hash Function or Hash Function.
You can roughly think of the digest algorithm as a special compression algorithm that “compresses” arbitrary length of data into a fixed length, unique “digest” string, like a digital “fingerprint” for the data. Abstract algorithm can also be understood as a special “one-way” encryption algorithm, it has only algorithm, no key, the encrypted data can not be decrypted, can not be derived from the original text.
Abstract algorithm is also used to generate pseudo random number (PRF, pseudo random function) because it has “unidirectional” and “avalanche effect” on input, and small difference of input will lead to drastic change of output.
Message-digest 5 (MD5) and Secure Hash Algorithm 1 (SHA-1) are the two most commonly used Digest algorithms, which can generate 16-byte and 20-byte digests. However, these two algorithms have low security strength and are not secure enough. They have been banned from use in TLS.
The current TLS recommendation is sha-2, the successor to SHA-1. Sha-2 is a series of digest algorithms, including SHA224, SHA256, and SHA384, which can generate 28-byte, 32-byte, and 48-byte digests respectively.
integrity
The algorithm ensures that the digital abstract is equivalent to the original text. Therefore, as long as we attach an abstract to the original text, we can ensure the integrity of the data.
However, the digest algorithm is not confidential. If the message is transmitted in plaintext, the hacker can modify the message and change the digest as well. Therefore, true integrity must be based on confidentiality, with a hybrid encryption system using session keys to encrypt messages and digs so that hackers cannot see the plaintext.
There’s a term called Hashed message Authentication code (HMAC).
A digital signature
Encryption algorithm combined with summary algorithm, our communication process can be said to be relatively secure. But there is a loophole, the two endpoints of communication. Hackers can masquerade as websites to steal messages, or they can masquerade as you and send messages such as payments and transfers to websites.
So how do you prove your identity in the digital world? Since the “private key” in asymmetric encryption can only be held by the owner, it is possible to implement “digital signature” with the private key and digest algorithm, as well as “authentication” and “non-repudiation”.
The principle of digital signature is actually very simple, is to reverse the use of public key and private key, the former is public key encryption, private key decryption, now is private key encryption, public key decryption. But because asymmetric encryption efficiency is too low, so the private key only encrypt the abstract of the original text, so the amount of computation is much smaller, and the digital signature is also very small, convenient storage and transmission.
A signature, like a public key, is fully public and can be accessed by anyone. But this signature can only be unlocked using the public key corresponding to the private key. Once you get the abstract, you can verify the integrity of the original text, just like signing a document to prove that the message really came from you.
These two behaviors also have special terms, called “signature” and “check”.
As long as you and the website exchange the public key, you can use “signature” and “signature check” to verify the authenticity of the message, because the private key is confidential, hackers can not forge the signature, to ensure the identity of the communication party.
Digital certificates and CAS
Using a combination of symmetric encryption, asymmetric encryption and digest algorithms, we have achieved four features of security. But there is also the issue of public key trust. Since anyone can publish a public key, we lack the means to prevent hackers from forging one.
We can solve the public key authentication problem in a similar way to key exchange, by signing the public key with another private key. Obviously, this will fall into “infinite recursion” again.
This time is really “no move”, to end this “endless cycle”, it is necessary to introduce “external force”, find a recognized trusted third party, let it as the “starting point of trust, recursive end”, build the trust chain of public key.
The third party is a Certificate Authority (CA). It is like the public security bureau, education bureau, notary center in the network world, with high credibility, by it to each public key signature, with their own credibility to ensure that the public key can not be forged, is credible.
CA’s signature authentication of the public key is also in a format. It is not simply bound to the identity of the holder, but also includes serial number, purpose, issuer, validity time, etc. These are put into a package and then signed to prove all kinds of information associated with the public key completely, forming a “digital Certificate”.
There are only a few well-known CA’s in the world, such as DigiCert, VeriSign, Entrust, and Let’s Encrypt. The certificates issued by these CA’s are divided into DV, OV, and EV, and the difference lies in the trust degree.
DV is the lowest, only domain level trusted, behind who do not know. EV is the highest, and has undergone rigorous legal and audit checks to prove the identity of the site’s owner (the company’s name appears in the browser address bar, for example, Apple or Github).
But how does a CA prove itself?
It’s all about the chain of trust. A smaller CA can be Signed by a larger CA, but at the end of the chain, the Root CA has to prove itself. This is called a self-signed Certificate or Root Certificate. You have to believe, or the whole chain of credential trust won’t go on.
With this Certificate system, the operating system and browser are built with the root Certificate of each major CA. When surfing the Internet, as long as the server sends its Certificate, you can verify the signature in the Certificate, and follow the Certificate Chain to verify layer by layer until you find the root Certificate, you can determine that the Certificate is credible. So that the public key inside can be trusted.
Weaknesses in the certification system
Although the certificate system (PKI) is the security Infrastructure of the entire network world, absolute security does not exist. It also has weaknesses and is the Key word of “trust”.
If a CA is mistaken or tricked into issuing the wrong certificate, even though the certificate is real, the website it represents is fake.
In a more dangerous situation, the CA is compromised by a hacker, or the CA is malicious because it (the root certificate) is the source of trust and all certificates in the trust chain are not trusted.
So, the certification system needs to be patched again.
For the first one, the CRL (Certificate Revocation List) and OCSP (Online Certificate Status Protocol) are developed to revoke problematic certificates in time.
In the second case, because there are so many certificates involved, the operating system or browser has to root out the CA, revoke trust and blacklist it, so that all certificates it issues are considered unsafe.
Q&A
Q: Can you tell the difference between HTTPS and HTTP?
A: Compared with THE HTTP protocol, HTTPS is the most important to increase security, the implementation of this security mainly depends on the two protocols depend on the underlying protocol is different, HTTPS in the transfer layer between the application layer and the transport layer protocol SSL/TLS, which makes HTTP on the inherent protocol to add a layer of dedicated data security tools. The HTTP port number is 80, and the HTTPS port number is 443
Q: Do you know of any ways to implement security features such as confidentiality and integrity?
A: Confidentiality is guaranteed by symmetric encryption AES, integrity is guaranteed by SHA384 digest algorithm, identity authentication and non-repudiation are guaranteed by RSA asymmetric encryption
Q: The name “key” in an encryption algorithm is very figurative. Can you compare it to a lock and key in real life?
A: An unlocked gate is not secure. Anyone can get in and take whatever they want, just like HTTP packets running through the network. For security needs to encrypt the plaintext, also for security needs to lock the gate, encrypted plaintext becomes ciphertext, no decryption is not able to understand, no key can not open the lock is not able to enter the door. One key and one lock, no other keys, that’s symmetric encryption. If one key locks the door and all the other keys open it, it’s asymmetric encryption. What’s the use of a key that locks the door? Proof that I own the yard, that I put the lock on it.
Q: Public key encryption is used in hybrid encryption because it can only be decrypted with a private key. So conversely, if the private key is encrypted and anyone can decrypt it with the public key, what’s the use of that?
A: Public key encryption private key decryption, private key signature public key verification.
Q: Why can a public key establish a trust chain, but not a symmetric key line in a symmetric encryption algorithm?
A: Non-encryption algorithms need to expose public keys so that clients can decrypt them. If symmetric encryption is used and the secret key is disclosed, the encryption effect will not be achieved
Q: Suppose there is a three-level certificate system (Root CA=> Level 1 CA=> Level 2 CA), can you explain the verification process of certificate trust chain in detail?
A: the client is found that the current site certificate level 2 CA, couldn’t find in the may issue institutions trust, will get secondary CA digital certificate issued by institutions to do inspection, found that it is A level of CA, was not in the may issue institutions trust, then find the level of CA digital certificate issued by institutions, found to be A trusted ROOT CA, complete verification. If the CA is not trusted at the last level, the user is warned