Stateless HTTP protocol
Before we explore credentials, we need to understand one feature of HTTP: statelessness.
HTTP stateless: THE HTTP protocol has no memory for transaction processing, that is, the server does not know the status of the client. When we send a request to the server, the server parses the request and returns the corresponding response, and the server takes care of the whole process. This process is completely independent, and the server does not record state changes before and after, i.e., the lack of state records. This means that if the previous information is needed for subsequent processing, it must be retransmitted, which results in the need to pass some previous repeated requests in order to get the subsequent response, but this effect is obviously too wasteful.
Two techniques for maintaining HTTP connections have emerged: sessions and Cookies. Session On the server side, that is, the website server, is used to store user Session information. Cookies are on the client side (browser side). When the browser visits the website next time, it will automatically attach Cookies and send them to the server. The server identifies the user through Cookies, determines whether it is in the login state, and then returns the corresponding response.
Cookies save the login credentials. With Cookies, you only need to carry Cookies in the next request and send the request without re-entering the user name, password and other information to log in again. Therefore, in crawlers, Cookies obtained after successful login are generally placed in the request header for direct request without re-simulating login.
Why is there a session?
First of all, the HTTP protocol is stateless, meaning that the server doesn’t discriminate between visiting a web page 100 times in a row and visiting it once, because it doesn’t remember you.
So what about situations where you really need the server to remember the current user? In order to solve this problem, session is proposed. In fact, it is not a new technology, nor can it be separated from HTTP protocol and any existing Web technology.
Principle is very simple, if you visit the web like shopping baths, for the first time in you there is no key, this time you pay the money at the reception desk is assigned a key for you, you go to take, because this is your unique identifier, then you can use this key to open a proprietary locker to store your clothing, swim, You use the key to open the locker and take out your clothes. When you leave the swimming pool, you return the key. Your swimming process is a session, or session.
So how do you implement sessions in a Web Server?
It is easy for you to understand from the above examples. It is mainly to solve two problems, one is the key problem and the other is the problem of storing user information. For the first question, what is it that will automatically bring you to the server every time you request something? If you are familiar with HTTP protocol, the answer is cookie. If you want to establish a session for the user, you can give the user a cookie, called session ID, when the user is successfully authorized, which is of course unique. For example, PHP sets a default set called phpsessid for the user to establish a session. The value looks like a cookie with a random string, and the next time you find a user with a cookie, the server knows, oh, this customer just came. The rest is to solve the second problem, that is, how to store the user’s information. The server knows that the user whose session ID is ABC is coming, and ABC wants to store its own private information, such as shopping cart information, how to deal with it? / TMP /phpsess_abc = / TMP /phpsess_abc = / TMP /phpsess_abc = / TMP /phpsess_abc = / TMP /phpsess_abc It needs to be serialized to a persistent data format at write time.
How to implement session sharing?
If your site is hosted on one machine, there is no problem because the session data is on that machine, but what if you use load balancing to distribute requests to different machines? At this time, there is no problem with the session ID on the client. However, if the user’s two requests are sent to two different machines, and its session data may be stored on one of them, the session data cannot be obtained, so session sharing becomes a problem.
1. Nfs-based Session sharing
NFS, short for Net FileSystem, was first developed by Sun to solve the problem of directory sharing between Unix network hosts.
This solution is the most simple to implement, without much secondary development, only need to mount the shared directory server to the local session directory of each channel server. The disadvantage is that NFS relies on complex security mechanism and file system, so the concurrency efficiency is not high, especially for small files with high concurrent read and write such as session. The IO -wait of the shared directory server is too high, which eventually slows down the execution efficiency of the front-end WEB application.
2. Database based Session sharing
The preferred option is of course the well-known MySQL database, and Heap is recommended to improve the efficiency of reading and writing session operations. This scheme has strong practicability and is widely used. Its disadvantage is that the concurrent read and write ability of session depends on the performance of Mysql database. At the same time, session elimination logic needs to be implemented to update and delete session records from the data table regularly. While row-level locking is an option for table engines, there is no denying that using a database to store sessions is a bit of a dead end.
3. Cookie-based Session sharing
This solution may be unfamiliar to us, but it is still commonly used on large sites. The principle is to encrypt and serialize the Session information of all site users and plant it in the root domain name in the form of cookies (for example: .host.com), when the browser is used to visit all secondary domain sites under the root domain name, all the Cookie content corresponding to the domain name will be transmitted, so as to realize the shared access of users’ cookie-based Session among multiple services.
The advantage of this solution is that no additional server resources are required; The disadvantage is that due to the restriction of HTTP header confidence length, only a small part of user information can be stored. At the same time, the cookie-based Session content needs to be encrypted and decrypted (for example, plaintext encryption and decryption using DES and RSA). MD5, SHA-1, etc.), it also consumes bandwidth resources, because the browser will request any resource under the current domain name to attach a local Cookie in the HTTP header to the server.
4. Memcache-based Session sharing
Memcache is a memory sharing system based on Libevent multi-way asynchronous I/O technology, and the simple Key + Value data storage mode makes the code logic small and efficient, so it occupies an absolute advantage in concurrent processing capacity. At present, the project I have experienced has reached the average query of 2000/ second. And server CPU consumption is still less than 10%.
In addition, it is worth mentioning that the Expires data expiration mechanism of Memcache’s memory hash table is exactly the same as the expiration mechanism of Session, which reduces the code complexity of deleting expired Session data. Compared with “database-based storage scheme”, This logic alone puts a lot of query pressure on the data table.
Where is the SESSION data stored?
On the server side, of course, but not in memory, but in a file or database. (Take PHP as an example below)
Session storage in PHP
By default, php.ini sets the SESSION saving mode as files (session.save_handler = files). The SESSION file directory by SESSION. Save_path specified, the file name with sess_ as the prefix, followed by the SESSION ID, such as: sess_a69278cf61b8e35d0fe35cdb3f79c71b. The data in the file is the SESSION data after serialization.
If the number of visits is large, a large number of SESSION files may be generated. In this case, you can set a tiered directory to save SESSION files to improve efficiency. /save_path”, where N is the level of the tier and save_path is the start directory.
When writing SESSION data, PHP will get the SESSION_ID of the client, and then find the corresponding SESSION file in the specified SESSION file save directory according to the SESSION ID. If no SESSION file exists, create it, and finally write data to the file after serialization. Reading SESSION data is a similar operation process. The read data needs to be deserialized to generate corresponding SESSION variables
So when is the Session created?
Of course, it is created while the server-side program is running. Different languages have different ways of creating sessions, but in Java it is created by calling the getSession method of HttpServletRequest (with true as an argument). When a Session is created, the server generates a unique Session ID for the Session. This Session ID will be used to retrieve the Session in subsequent requests. After a Session is created, you can call session-related methods to add content to the Session. The content is stored in the server and sent to the client only with the Session ID. When the client sends a request again, it will bring the Session ID with it. After receiving the request, the server will find the corresponding Session based on the Session ID and use it again.
Create: sessionid produced for the first time is a server application to call it in until the getSession (true) statement to be created.
Delete: the program calls httpsession.invalidate (); Program closed. Where the session is stored: Memory on the server side. However, sessions can be persisted in a special way (memcache, Redis).
Where does the sessionID come from and how is the sessionID used? When a client requests a session object for the first time, the server creates a session for the client and calculates a sessionID using a special algorithm to identify the session object.
Common mistakes:
Session cookies and persistent cookies
In legend, session cookies are stored in the memory of the browser. After the browser is closed, the Cookie becomes invalid, and the persistent Cookie is saved to the hard disk of the client to keep the user logged in for a long time when the browser can continue next time.
The legend is false, the expiration time is determined by the Cookie’s Max Age or Expires. Persistent cookies are set to be valid for a long time, so that the next access will still carry the previous Cookie, so that the login state can be maintained directly.
The Session error
There is a common misconception when talking about the Session mechanism: just close the browser and the Session disappears. This understanding is wrong. In the case of sessions, the server keeps the Session until the application tells the server to remove it.
But when we close the browser, the browser doesn’t actively notify the server that it’s going down before closing, so the server doesn’t have a chance to know that the browser is down. The reason for this illusion is that most Session mechanisms use Session Cookies to store SessionID information, and when you close the browser, the Cookies disappear and the original Session cannot be found when you connect to the server again. If the Cookies set by the server are saved to the hard disk, or the browser is somehow rewritten to send HTTP request headers and the original Cookies are sent to the server, the original SessionID can still be found when the browser is opened again and the login status can still be saved. Closing the browser does not cause the Session to be deleted. Therefore, the server needs to set an expiration time for the Session. When the expiration time has passed since the last Session used by the client, the server considers that the client has stopped activities and deletes the Session to save storage space.