The third part provides a series of techniques and tricks you can use to track identity, perform security checks, and control access to content.

Client recognition and cookie mechanism

A Web server may be talking to thousands of different clients at the same time. These servers typically keep a record of who they are talking to and do not assume that all requests are coming from anonymous clients.

The HTTP header

  • The HTTP header that carries user-related information

IP address of the client

The use of client IP addresses to identify users has many disadvantages that limit its effectiveness as a user identification technology.

  • The client IP address describes the machine being used, not the user. If multiple users share the same computer, it is impossible to distinguish between them.
  • Many Internet service providers dynamically assign IP addresses to users when they log on.
  • To improve security and manage scarce Address resources, many users browse online content through Network Address Translation (NAT) firewalls. These NAT devices hide the actual client IP addresses behind the firewall, translating the actual client IP addresses into a shared firewall IP address (and different port numbers)
  • HTTP proxies and gateways typically open new TCP connections to the original server. The Web server will see the IP address of the proxy server, not the client. Some brokers get around this problem by adding a special client-IP or X-Forwarded-For header that holds the original IP address. But not all agents support this behavior.

The user login

Instead of passively guessing a user’s identity based on his IP address, the Web server can explicitly ask who the user is by asking for authentication (login) with a username and password.

To make logging into Web sites easier, HTTP includes a built-in mechanism to transmit user information to Web sites using WWWAuthenticate and Authorization headers.

Fat URL

Some Web sites generate a specific version of the URL for each user to track their identity. Typically, the real URL is extended to add some status information at the beginning or end of the URL path. When the user views the site, the Web server dynamically generates some hyperlinks to continue to maintain the state information in the URL.

A MODIFIED URL that contains user status information is called a fat URL. Users can be identified with fat urls as they browse the site. But there are several serious problems with this technique.

  • Ugly urls, fat urls that appear in the browser, can be confusing to new users.

  • Unable to share urls, fat urls contain state information related to a particular user and session. If you send this URL to someone else, you may inadvertently share all the personal information you’ve accumulated.

  • Breaking the cache and generating a user-specific version for each URL means that there are no more publicly accessible urls to cache.

  • The extra server load, the server needs to rewrite the HTML page to make the URL fat.

  • Escape, when a user jumps to another site or requests a specific URL, it is easy to inadvertently “escape” from a fat URL session. Fat urls work only if the user strictly follows the pre-modified link. If the user runs away from the link, he loses information about his progress (perhaps an already full shopping cart) and has to start over.

  • It is non-persistent between sessions and all information is lost when the user logs out unless the user bookmarks a particular fat URL.

cookie

Cookies are currently the best way to identify users and implement persistent sessions.

Cookies are important, and they define some new HTTP headers, so we’ll cover them in more detail than the previous techniques. The presence of cookies also affects caching, and most caches and browsers do not allow caching of the contents of any cookies.

Cookie type

Cookies can be broadly divided into two categories: session cookies and persistent cookies.

  • Session cookie: A session cookie is a temporary cookie that records a user’s Settings and preferences when visiting a site. The session cookie is deleted when the user exits the browser.

  • Persistent cookies: Persistent cookies live longer; They are stored on the hard disk, the browser exits and the computer restarts. Persistent cookies are typically used to maintain a profile or login name for a site that a user will visit periodically

The only difference between session cookies and persistent cookies is their expiration time.

How does cookie work

When a user accesses a Web site for the first time, the Web server knows nothing about the user. The Web server wants the user to come back, so it wants to “snap” the user with a unique cookie so it can recognize the user later. The cookie contains an arbitrary list of information such as name=value and is pasted to the user via the set-cookie or set-cookie2 HTTP response (extension) header.

Cookie jar: State of the client

The basic idea behind cookies is that the browser accumulates a set of server-specific information that is presented to it each time it visits the server. Because the browser is responsible for storing cookie information, this system is called client-side state. The official name of this cookie specification is HTTP State Management Mechanism.

Different sites use different cookies

There can be hundreds or thousands of cookies in the cookie jar inside the browser, but the browser doesn’t send every cookie to every site. In fact, they typically only send 2 or 3 cookies per site. Here’s why:

  • Transferring all those cookie bytes can seriously degrade performance. The browser actually transmits more cookie bytes than it actually transmits content!
  • Cookies contain server-specific name-value pairs, so for most sites, most cookies are just unrecognized garbage.
  • Sending all cookies to all sites raises potential privacy issues, as sites you don’t trust get information you only want to send to other sites.

Many Web sites have agreements with third-party vendors to manage advertising. These ads are made to look like an integral part of a Web site, and they do send persistent cookies. When a user visits another site served by the same advertising agency, the browser (since the domain is matched) sends back the persistent cookie that was set earlier. Marketing firms could combine this technology with the Referer header to secretly build a detailed dataset of users’ profiles and browsing habits. Modern browsers allow users to set privacy features to limit the use of third-party cookies.

  1. The cookie generating server can add a Domain property to the set-cookie response header to control which sites can see the cookie. Such as:
Set-cookie: user="mary17"; domain="airtravelbargains.com"
Copy the code
  1. Cookie path properties, cookie specifications even allow users to associate cookies with parts of a Web site. You can do this with the Path attribute, which lists all cookies as valid under the URL Path prefix. Such as:
Set-cookie: pref=compact; domain="airtravelbargains.com"; path=/autos/
Copy the code

Composition of the cookie

There are two different versions of the cookie specification in use today:

  • Cookies Version 0 (sometimes referred to as Netscape cookies)
  • Cookies Version 1 (RFC 2965).

Cookies version 1 is an extension of Cookies version 0 and is not as widely used.

Cookies and session tracing

Basic Authentication Mechanism

certification

Authentication means giving some proof of identity.

HTTP challenge/response authentication framework

HTTP provides a native Challenge /response framework that simplifies the authentication process for users. The HTTP authentication model is shown in figure 1

Authentication protocol and header

HTTP provides an extensible framework for different authentication protocols through a set of customizable control headers.

HTTP defines two official authentication protocols: basic authentication and digest authentication.

Security domain

The Web server organizes the protected documents into a Security realm. Each security domain can have a different set of authorized users.

  • A security domain on a Web server

Basic authentication

Basic authentication is the most popular HTTP authentication protocol. Almost every major client and server implements a basic authentication mechanism. Basic authentication was first proposed in the HTTP/1.0 specification, but has since been moved to RFC 2617, which details HTTP authentication mechanisms.

Basic Authentication Instance

The following figure shows a detailed basic authentication example.

  • In Figure A, the user requested a private family photo /family/jeff.jpg.
  • In Figure B, the server sends back a 401 Authorization Required password challenge for private family photos along with a www-Authenticate header. This header requests basic authentication for the Family domain.
  • In Figure C, the viewer receives a 401 query, pops up the address box, and asks for the user name and password of the Family domain. When a user enters a username and password, the browser connects them with a colon, encodes them into a “scrambled” base-64 representation, and sends them back in the Authorization header.
  • In Figure D, the server decodes the username and password, verifies that they are correct, and then returns the requested message with an HTTP 200 OK message.

Note that the basic Authentication protocol does not use the authentication-info header shown in the table.

Base-64 Indicates the user name or password

HTTP basic authentication packages a (colon-separated) user name and password together and encodes them in Base-64 encoding.

Proxy authentication

The intermediate proxy server can also implement the authentication function. Some organizations use proxy servers to authenticate users before they access a server, LAN, or wireless network. Access policies can be centrally managed on a proxy server. Therefore, it is convenient to use a proxy server to provide unified access control over internal resources in an organization. The first step in this process is proxy authentication.

  • Lists the differences between the status codes and headers used by Web servers and proxies in authentication.

Security flaws in basic authentication

Basic authentication is simple and convenient, but not secure. It can only be used to prevent unintentional access by non-malicious users or in conjunction with encryption technologies such as SSL.

Basic certification has the following security flaws:

  • Basic authentication sends user names and passwords across the network in a form that can be easily decoded.
  • Even if passwords are encrypted in a way that is harder to decode, third-party users can still capture the changed user names and passwords and replay them back to the original server again and again to gain access to the server.
  • Even if basic authentication is used for less important applications, such as access control over a company’s internal network or access to personalized content, bad habits can make it dangerous. Many users, frustrated by the number of password-protected services, use the same username and password across them. For example, some crafty villain could capture a user name and password in plain text from a free Internet mail site, and then discover that you can access an important online banking site with the same user name and password!
  • Basic authentication provides no protection against brokers and intermediate nodes acting as middlemen, which do not modify the authentication header but modify the rest of the message, thus drastically changing the nature of the transaction.
  • It’s easy to fool basic authentication with a fake server. If the user can be convinced that he is connecting to a legitimate host protected by basic authentication when he is actually connecting to a malicious server or gateway, the attacker can ask the user for a password, store it for future use, and then send a fabricated error message to the user.

The certification

Basic authentication is convenient and flexible, but extremely insecure. The user name and password are transmitted in plain text, and no measures are taken to prevent packet tampering. The only way to use basic authentication safely is to use it in conjunction with SSL.

The improvement of certification

Authentication is another HTTP authentication protocol, which tries to fix the serious defects of the basic authentication protocol. In particular, summary authentication has been improved as follows.

  • Passwords are never sent over the network in clear text.
  • Prevents malicious users from capturing and replaying the authentication handshake process.
  • You can selectively prevent packet content tampering.
  • Defend against several other common attacks.

Protect passwords with digests

The motto of authentication is “never send passwords over the network”. Instead of sending a password, the client sends a “fingerprint” or “digest” of the password, which is an irreversible scrambler of the password.

One-way the

Abstract is “the concentration of information subject”. Abstract is a one-way function, mainly used for converting infinite input values into finite condensed output values. A common digest function, MD5, converts an arbitrarily long sequence of bytes into a 128-bit digest.

Use random numbers to prevent replay attacks

But simply hiding the password is not safe, because even if the password is unknown, someone with ulterior motives can intercept the digest and play it back to the server over and over again. Digests are as good as passwords.

To prevent such replay attacks, the server can send a special token to the client called a random number (nonce), which changes frequently (maybe every millisecond, or every authentication). The client appends this random number token to the password before calculating the digest.

The handshake mechanism of authentication

HTTP digest authentication protocol is an upgraded version of authentication, using headers similar to basic authentication.

  • Syntax comparison between basic authentication and digest authentication

Calculation of abstracts

The core of summary authentication is the unidirectional summary of the combination of public information, confidential information and random value with time limit.

The input data of the algorithm

The summary is calculated from the following three components.

  • A pair of functions consisting of the one-way hash function H(d) and the abstract KD(s,d), where S represents the password and D represents the data.
  • A block of data that contains security information, including passwords, is called A1.
  • A data block containing the non-secret attributes of the request message is called A2.

H and KD process two pieces of data A1 and A2 to produce a summary.

Algorithm H(D) and KD(s, D)

Abstract authentication supports the selection of various digest algorithms. RFC 2617 recommends two algorithms: MD5 and MD5-sess (sess indicates session). If no other algorithm is specified, MD5 is the default algorithm.

Secure HTTP

I discussed some OF the HTTP features that help identify and authenticate users. These technologies work well in a friendly environment, but in an environment full of profit-driven and hostile adversaries, they are not enough to protect important transactions.

Secure HTTP

The previous sections discussed some lightweight methods that provide authentication (basic and digest authentication) and packet integrity checking (digest Qop =”auth-int”). These methods are great for many network transactions, but not powerful enough for large-scale shopping, banking, or access to confidential data. These more important transactions require a combination of HTTP and digital encryption to be secure.

A secure version of HTTP should be efficient, portable, and easy to manage, not only able to adapt to changing situations, but also to meet the needs of society and government. We need an HTTP security technology that provides the following capabilities.

  • Server authentication (where clients know they are talking to a real server, not a fake one).
  • Client authentication (servers know they are talking to a real client, not a fake one).
  • Integrity (client and server data will not be modified).
  • Encryption (conversations between the client and server are private without fear of eavesdropping).
  • Efficiency (an algorithm that runs fast enough to be used by low-end clients and servers).
  • Universality (virtually all clients and servers support these protocols).
  • Managed extensibility (anyone, anywhere, can communicate securely immediately).
  • Adaptability (ability to support the best known security methods of the day).
  • Feasibility in society (to meet the political and cultural needs of society).

HTTPS

HTTPS is the most popular form of HTTP security. It was pioneered by Netscape and is supported by all major browsers and servers.

With HTTPS, all HTTP request and response data is encrypted before being sent to the network. HTTPS provides a transport-level password Security Layer (see figure below) underneath HTTP, either using SSL or its successor, Transport Layer Security (TLS). Because SSL and TLS are very similar, we use the term SSL loosely in this book to refer to SSL and TLS.

Most of the hard encoding and decoding is done in SSL libraries, so Web clients and servers don’t have to modify their protocol processing logic much when using secure HTTP. In most cases, it’s just a matter of replacing TCP calls with SSL input/output calls and adding a few more calls to configure and manage security messages.

Digital encryption

This section describes some background information about the encryption and encoding technologies used by SSL and HTTPS.

  • Cipher: An algorithm that encodes text so that it cannot be read by a voyeur.
  • Key: a digitized parameter that changes the behavior of a cipher.
  • Symmetric key cryptosystem: an algorithm that uses the same key for encoding/decoding.
  • Asymmetric key cryptosystem: Algorithms that encode/decode with different keys.
  • Public-key encryption system: a system that enables millions of computers to easily send confidential messages.
  • Digital signature: checksum used to verify that packets are not forged or tampered with.
  • Digital certificate: Identifying information verified and issued by a trusted organization.

The mechanism and technique of cryptography

Cryptography is the mechanism and skill of encoding/decoding messages. People have been sending secret messages in encrypted ways for thousands of years. But cryptography can do more than just encrypt messages to prevent them from being read. It can also be used to prevent tampering with messages, and even to prove that a message or transaction really came from you, like the handwritten signature on a check or the seal wax on an envelope.

password

Cryptography is based on a secret code called a cipher. A cipher is an encoding scheme — a combination of a particular way of encoding a message and a corresponding decoding method that is used later. The original message before encryption is usually called plaintext or cleartext. An encoded message that uses a password is often called a ciphertext.

Cipher machine

As technology improved, people began to build machines that could encode and decode messages quickly and accurately using much more complex ciphers. These cryptographers can do more than simple rotations; they can also replace characters, change character order, slice and dice messages, making cracking code harder.

The password used for the key

Both algorithms and machines can fall into enemy hands, so most machines have a number plate that can be set to a large number of different values to change how the password works. Even if the machine is stolen, the decoder will not work without the correct number plate Settings (key values).

These password parameters are called keys. The decryption process can proceed correctly only if the correct key is entered into the cipher machine. Cryptographic keys can make a cryptographic machine look like multiple virtual cryptographic machines, each of which has a different key value and therefore behaves differently.

Digital password

A digital key is just a number compared to a metal key or a number plate setup in a mechanical device. These digital key values are inputs to the encoding/decoding algorithm. Encoding algorithms are functions that read a piece of data and encode/decode it based on the algorithm and key values.

Symmetric key encryption technology

Many digital encryption algorithms are called symmetric-key encryption techniques because they encode with the same key value as they decode (e=d). Let’s call them collectively key K.

Key length and enumeration attacks

A good encryption algorithm forces an attacker to try every possible key before cracking the code. Using violence to try all key values is called an enumeration attack.

  • Longer keys take more effort to crack

Establishing a Shared key

One of the drawbacks of symmetric key encryption is that the sender and receiver must have a shared secret key before they can talk to each other.

Each pair of communication entities needs its own private key. If you have N nodes, and each node has security conversations with all n-1 other nodes, there will be roughly N2 secret keys in total: this would be an administrative nightmare.

Public key encryption

Instead of using a separate encryption/decryption key for each host pair, public-key encryption uses two asymmetric keys: one for encoding and one for decoding host messages. The encoding key is well known (hence the name public-key encryption), but only the host is aware of the private decryption key. This way, everyone can find the public key for a particular host, making key creation much easier. But the decoding key is secret, so only the receiver can decode the message sent to it.

With public-key encryption, all computer users around the world can use secure protocols. It is very important to develop a standardized Public Key technology package, so it has been more than ten years since large-scale Public Key Infrastructure (PKI) standards were created.

Hybrid encryption system and session key

Anyone who knows the public key can send secure messages to a public server, so asymmetric public-key encryption systems are very useful. Two nodes do not have to exchange private keys in order to communicate securely.

But calculations for public-key encryption algorithms can be slow. It actually uses a mixture of symmetric and asymmetric strategies. For example, it is common to establish secure communication between two nodes using convenient public-key encryption, then use that secure channel to generate and send a temporary random symmetric key that encrypts the rest of the data using faster symmetric encryption.

A digital signature

In addition to encrypting and decrypting the message, the encryption system can also be used to sign the message to show who wrote the message and to prove that the message has not been tampered with. This technology is called digital signing.

The signature is a encrypted checksum

A digital signature is a special encryption check code attached to a packet. There are two benefits to using digital signatures.

  • The signature proves that the author compiled the packet. Only the author has the most secret private key, so only the author can calculate these checksums. A checksum is like a personal “signature” from the author.

  • The signature prevents packet tampering. If a malicious attacker modifies the packet during transmission, the checksum no longer matches the packet. The checksum can be generated only by the author’s private key. Therefore, an attacker cannot forge a correct checksum for a tampered packet.

Digital signatures are usually generated using asymmetric public key techniques. Because only the owner knows its private key, the author’s private key can be used as a kind of “fingerprint.”

The following figure shows an example of how node A sends A message to node B and signs it.

  • Node A extracts the variable-length message into A fixed-length summary.
  • Node A applies A “signature” function to the digest, which takes the user’s private key as an argument. Because only the user knows the private key, the correct signing function states that the signer is the owner. In the figure, since the decoding function D contains the user’s private key, we use it as a signature function.
  • Once the signature is calculated, node A appends it to the end of the message and sends both the message and the signature to node B.
  • At the receiving end, node B can check the signature if it needs to be sure that the message was actually written by node A and has not been tampered with. Node B receives the signature via the private key scrambler and applies the inverse function using the public key. If the unpacked digest does not match node B’s own digest version, either the packet was tampered with during transmission, or the sender does not have node A’s private key (that is, it is not node A).

The digital certificate

This section introduces digital certificates, the “ID card” of the Internet.

The main contents of the certificate

Digital certificates also contain a set of information, all digitally issued by an official “certificate Authority.”

  • Typical digital signature format

Moreover,Digital certificates usually also include the public key of the object, and descriptive information about the object and the signature algorithm used. Anyone can create a digital certificate, butIs notEveryone can obtain a respected authority to vouch for the certificate information and issue the certificate with their private key.

X. 509 v3 certificate

Unfortunately, there is no single global standard for digital certificates. Just as not all printed ID cards contain the same information in the same bit, digital certificates come in many slightly different forms. The good news is that most certificates in use today store their information in a standard format, X.509 V3.

Authenticate the server with a certificate

After establishing a secure Web transaction over HTTPS, modern browsers automatically obtain the digital certificate of the connected server. If the server does not have a certificate, the secure connection will fail. The server certificate contains many fields, including:

  • The name and host name of the Web site;
  • A public key for a Web site;
  • The name of the issuer of the signature;
  • A signature from a signature issuing authority.

Verify that the signature is genuine

HTTPS — Details

HTTPS is the most common secure version of HTTP. It is so widely used that HTTPS is available on all major commercial browsers and servers. HTTPS combines the HTTP protocol with a powerful set of symmetric, asymmetric, and certificate-based encryption technologies, making HTTPS not only secure, but also flexible and easy to manage across the disorganized, decentralized global Internet.

Summary of the HTTPS

HTTPS is HTTP sent over the secure transport layer. It encrypts HTTP packets to a security layer before sending them to TCP. See figure (b)

HTTPS scheme

Secure HTTP is now optional. When a client (such as a Web browser) is asked to perform a transaction on a Web resource, it checks the URL scheme.

  • If the URL scheme is HTTP, the client opens a connection to server port 80 (by default) and sends the old HTTP command to it (see Figure 14-14a).

  • If the URL scheme is HTTPS, the client opens a connection to the server port 443 (by default), then “shakes hands” with the server, exchanging some SSL security parameters with the server in binary format, along with an encrypted HTTP command (see Figure 14-14b).

SSL is a binary protocol, and unlike HTTP, traffic is carried on a different port (SSL is usually carried on port 443). If both SSL and HTTP traffic arrive on port 80, most Web servers will interpret binary SSL traffic as an incorrect HTTP and close the connection.

Establishing secure transmission

This process is slightly more complicated in HTTPS because of the SSL security layer. In HTTPS, the client first opens a connection to the Web server port 443 (the default port for secure HTTP). Once the TCP connection is established, the client and server initialize the SSL layer, communicate encryption parameters, and exchange keys. Once the handshake is complete, SSL initialization is complete and the client can send the request to the security layer. These packets are encrypted before they are sent to TCP. The following figure illustrates this process.

The SSL handshake

Before sending an encrypted HTTP packet, the client and server perform an SSL handshake, during which they do the following:

  • Switching protocol version number;
  • Choose a password that both sides understand;
  • Authenticate the identities of both ends.
  • Generate a temporary session key to encrypt the channel.

(Simplified version) SSL handshake

Server certificate

SSL supports bidirectional authentication by sending the server certificate back to the client and sending the client certificate back to the server. Nowadays, client certificates are not often used for browsing. Most users don’t even have their own client certificates. The server can require client certificates, but this is rarely the case in practice.

The server certificate is an X.509 V3-derived certificate that shows the organization’s name, address, server DNS domain name, and other information (see figure below)

  • HTTPS certificates are X.509 certificates with site information

The validity of the site certificate

SSL itself does not require the user to check the Web server certificate, but most modern browsers do a simple integrity check on the certificate and provide the user with the means to perform further thorough checks. A Web server certificate validity algorithm developed by Netscape is the basis of most browser validation technologies. The verification steps are as follows:

  • Date of inspection
  • Signature issuer credibility check
  • Signature detection
  • Site identity detection

Virtual hosts and certificates

Handling secure traffic on virtual hosting sites (multiple host names on one server) can be tricky. Some popular Web server programs support only one certificate. If the user requests a virtual host name that does not exactly match the certificate name, the browser displays a warning box.

HTTPS client instance

SSL is a complex binary protocol. Fortunately, it is not too difficult to write SSL clients and servers with the help of commercial or open source libraries.

OpenSSL

OpenSSL is the most common open source implementation of SSL and TLS. The OpenSSL project was developed by a group of volunteers with the goal of developing a robust, fully functional, commercial-grade toolset to implement SSL and TLS protocols and a fully functional common cryptographic library.

The secure traffic is transmitted in tunnel mode by proxy

Clients typically access the Web server using a Web proxy server on their behalf. For example, many companies place an agent on the secure boundary between their corporate network and the public Internet. The proxy is the only device that the firewall router allows to exchange HTTP traffic and may perform virus detection or other content control work.

But once the client starts encrypting data sent to the server with the server’s public key, the proxy can no longer read the HTTP header! The proxy cannot read the HTTP header, so it cannot know where to redirect the request.

To make HTTPS work with the proxy. One common technique is the HTTPS SSL tunneling protocol. With the HTTPS tunneling protocol, the client first tells the agent which secure host and port it wants to connect to. This is told in clear text before encryption begins, so the agent can understand the message.