Reprinted with permission from the author, link to the original article by Roronoa Zoro

This article mainly talks about common scenarios of data encryption scheme, as well as the prospect of the future encryption technology, first look at a few news:

Facebook stores user passwords in plain text:

Hundreds of millions of Facebook users had their account passwords stored in plain text and searchable by thousands of Facebook Employees — in some cases going back to 2012, KrebsOnSecurity has learned. Facebook says an ongoing investigation has so far found no indication that employees have abused access to this data.

Back in 2012, Facebook stored the passwords of hundreds of millions of users’ accounts in plain text, allowing thousands of Facebook employees to search…… at will

Stored Hundreds of Millions of User Passwords in Plain Text for Years

CSDN 6 million user account passwords leaked:

Beijing time on December 21 evening news, Chinese developer technology online community CSDN issued a statement tonight, for the “6 million user account password leakage” incident, admitted that some user accounts are at risk, will temporarily shut down user login, and require “accounts registered before April 2009, Users who have not changed their passwords since September 2010 change their passwords immediately.

CSDN explains how 6 million users’ passwords were leaked: Temporarily shut down logins

Why can’t passwords be stored in plain text

Many novice programmers store passwords like this:

username phone password
Xiao Ming 18888888888 asd123456
daming 17777777777 123abc! @ #

Why is it unsafe to do so?

First of all, in the event of a data breach, a plain-text password directly exposes the user’s privacy, so anyone can log in to the exposed account and change it at will. Secondly, even if it is not leaked, internal employees can easily access users’ plaintext passwords. When the company grows in size, you can’t guarantee that there are no bad guys inside the company, whether they will search for certain users’ passwords and violate users’ privacy. So storing passwords in plain text is never secure.

Even if you use a password like ppnn13%dkstFeb.1st, it will be stored in plain text, so there is no security.

No matter how complex the password is, it can’t compete with CSDN plaintext

From Zhihu user: Right Here

Off topic: What’s the most famous computer password in history?

password meaning
FLZX3000cY4yhx9day Flying down 3,000 feet, suspected Milky Way for nine days
hanshansi.location()! ∈ [gusucity] Beyond the city walls, from Temple of Cold Hill
hold? fish:palm You can’t have your cake and eat it too
Tree_0f0=sprintf(“2_Bird_ff0/a”) Two golden Orioles on the green willow sing
csbt34.ydhl12s Green moss on the pond three or four, oriole at the bottom of the leaf one or two
for_$n(@ RenSheng)_$n+=”die” No one can die since ancient times
while(1)Ape1Cry&&Ape2Cry With monkeys’ sad adieus the riverbanks are loud
doWhile(1){LeavesFly(); YangtzeRiverFlows()}; The boundless forest sheds its leaves shower by shower. The endless river rolls its waves hour after hour.
dig? F*ckDang5 Hoe Hoe Day

How to store & check passwords

Since passwords cannot be stored in clear text, how can they be stored safely? How do I check that the password entered by the user is correct?

It is necessary to store relevant information for verification purposes. Is there a mechanism that can store only part of a password and also be used for verification purposes? In this way, even if the database is leaked, the attacker cannot use the information to push back the user password, and then protect the user account security.

Hash functions can solve this problem.

A hash function is one-way and irreversible, and as you can see from the figure above, the hash function will discard some information as it passes through, just like this algorithm:

Algorithm: when storing user names, discard user surnames and shuffle them randomly, input zhao Ritian and output tian Ritian.

Even knowing the algorithm and the date, it was impossible to deduce the name Zhao Ritian because some of the information was lost.


h = h a s h ( p ) h = hash(p)

H is the final value stored in the database, and P is the original password of the user. When the user logs in, enter password P1. H1 =hash(p1)h_1 =hash(p_1)h1=hash(p1) to check whether H1 is the same as record H in the database and determine whether the password entered by the user is correct.

All hash functions have one property: if two h values are inconsistent, then the input p value is also different (one-way hash function), but on the other hand, the input and output are not one-to-one correspondence, for example, there are different H values, so that the p value calculated by the hash function is the same.

Is the hash function safe?

No. Due to the nature of the hash function above, if two users use 123456abc, then the database stores the same value of H, and different passwords may calculate the same value of H (collision attack), then the attacker can brute force calculate all possibilities based on the hash function, Make a table so that when you get the h value you can infer the password is 123456abc. This practice is called the rainbow list attack.

For example, MD5, a commonly used hash function, is now insecure due to its increasing computing power:

Since 1996, MD5 has been proved to have weaknesses and can be cracked. For data requiring high security, experts generally recommend switching to other algorithms, such as SHA-2. In 2004, MD5 was proved to be unable to prevent collision attacks and therefore not suitable for security authentication such as SSL public key authentication or digital signatures

Security reinforcement: add salt

The rainbow table can be used with a Sated Hash, such as storing a random salt value for each user password and using this salt value and password P to calculate h:


h = h a s h ( s a l t . p ) h = hash(salt, p)

The database also stores salt and H values, so that attackers want to obtain a user’s password, they have to establish a corresponding rainbow table, increasing the cost of the attacker.

But even then, adding salt to sha-2 isn’t secure, because as computing power increases and attack costs drop over time, groups with the money can still build these rainbow tables and steal users’ passwords.

Security reinforcement: improve calculation strength

If we can use hardware to control the time of each hash, say, 1 second, no matter what machine or high performance CPU is used, 1 second each time, the attacker needs to compute this rainbow table. It takes 115 days for 10 million combinations (hash space is well over 10 million), which makes it hard to crack.

Bcrypt is a cryptographic hash function designed by Niels Provos and David Mazieres based on Blowfish encryption algorithm and presented in USENIX in 1999 [1]. The bcrypt implementation uses a salted process to defend against rainbow list attacks, while the bcrypt is also an adaptive function that increases the number of iterations against increased computing power through brute force cracking.

In addition to encrypting your data, bcrypt by default overwrites the original input file three times with random data before deleting it, to frustrate the attempts of anyone who might have access to your computer’s data to recover it. If you do not want to use this feature, you can disable it.

In addition to bcrypt this adjustment of computing strength, resist the increasing CPU computing power of the attack risk algorithm, Scrypt algorithm also uses memory space, each calculation must occupy a certain content, but bcrypt algorithm due to mature implementation, the actual use of more, Spring Boot Security uses this algorithm to encrypt passwords.

For example, a password encrypted by Bcrypt looks like this:

$2a$07$woshiyigesaltzhi$$$$$.lrU488y7E1Xw.JA4uizIu.PBSSe7t4y

2a represents the version of bcrypt algorithm, 07 represents the number of iterations, and the higher the number, the longer the time required for each calculation. The woshiyigesaltzhi behind $$$$$represents the salt value used for encryption. The database can directly store this field, for example:

name phone pwd_hash
Xiao Ming 1234 $2a$07$woshiyigesaltzhi$$$$$.lrU488y7E1Xw.JA4uizIu.PBSSe7t4y

This is also the password storage method recommended in this paper, hash + salt + calculation strength, which can better protect user password security. Since login is not a frequent operation, it does not matter much if users wait for one second each time they log in.

User data password encryption scheme: double hash

Password information can be hash to achieve irreversible purposes, but some user data is reversible and requires encryption, what to do? For example, user online documents are encrypted and decrypted by user – defined passwords.

It’s easy to think of symmetric encryption like AES256:


e = A E S 256 ( s a l t . t e x t ) e = AES256(salt, text)

The server stores the user’s password by taking it as a salt value to encrypt and decrypt the document.

It is not safe to store passwords in plain text, so you can use a double hash method to ensure security:

Implementation scheme:

The stored value is marked as H1. When the user requests to encrypt data, the encrypted password is provided. We verify the correctness of the user’s password through H1 and calculate another hash value with the user’s password, denoted as H2. H2 is a simple hash with no salt. In this case, we use H2 to encrypt and decrypt user documents:


e = A E S 256 ( h 2 . t e x t ) e = AES256(h2, text)

Remember, the value of H2 is not stored (in a database) and is used to encrypt and decrypt data. Because the value of H2 is used up and thrown away. H2’s hash functions can be private for additional security.

Why do we need h2? Symmetric encryption algorithms such as AES require a fixed-length encryption parameter. Secondly, data security can be further ensured after hash. If a database is leaked, an attacker cannot decrypt the database even if he/she knows the user password and does not know the private hash function.

Encryption of other user data

The encryption scheme mentioned above is password controlled, and the data security is very high (only the user knows the password, and the data cannot be recovered after the password is lost). However, this method is not suitable for many scenarios, such as regular user data: Mobile phone numbers, social accounts, addresses, names, and frequently accessed data are not suitable for password encryption, which is inefficient. How to protect the security of such data?

To understand this, you need to know how user data is transferred:

The data generated by the user on the client software (such as the browser) is encrypted through HTTPS and transmitted to the back-end server, which is processed by the server software (such as Java), and then stored to the storage device (such as the hard disk) through the database interface called by the database software (such as mysql).

There are four stages for data encryption, and the closer the encryption process is to the user, the more secure it is:

  • Server software encryption: Data is encrypted and stored (performed in memory) immediately after it arrives at the server, such as Java AES256
  • Database software encryption: Call database API to achieve database encryption, such as AES encryption of mysql
  • On-storage encryption: Encrypts storage devices using hardware encryption technologies, such as the cloud disk encryption function provided by cloud service providers
encryption Prevent internal leaks Preventing database leaks Prevent leakage of the physical machine
Server software encryption √ (Most scenarios) Square root Square root
Database software encryption Square root Square root
Storage end drop disk encryption Square root

The first two encryption methods can ensure that even the database administrator cannot view user data. The last encryption method is not meaningful, but it is still required by laws in some countries or regions or by users who require hard disk encryption. Database software encryption still has the risk of internal leakage, such as mysql binlog, even if you use AES256, data synchronization key is stored in the binlog, there is a way of leakage.

If the related user data does not need to be searched (only needs to be read or written), you can use server-side encryption or database encryption to protect the user data. But if the relevant data needs to support the search function, this problem is tricky.

Searchable encryption

This paper Practical Techniques for Searches on Encrypted Data was published in 2000 and started a new research direction Searchable Encryption, The first practical searchable encryption scheme SWP is proposed. The implementation idea is as follows: each word is encrypted, and then a hash value is embedded in the ciphertext. The server extracts the hash value to check whether there is a similar special format in the ciphertext to confirm whether it matches the search.

The above idea is ideal, but there will be many difficulties when landing, such as the use of fixed size words, but the most important search system is its word segmentation engine, how to carry out good word segmentation for multiple languages directly determines the effect of the search, many search systems still use the following structure:

Synchronize mysql data to ElasticSearch unidirectionally. Today, more than 20 years later, searchable encryption technology is still not available. It can even be said that if the software vendor provides search function, the data is stored without encryption (hard drive encryption is not visible to the software layer), and the encrypted data cannot support search function.

The premise of the conspiracy

If I am a bad person, the important premise for realizing a plot is that the operation is simple enough and the number of people who know it is small enough. When the complexity is too high or the number of people needed is too large, it is impossible to realize the plot.

Knowing this, we can easily conclude that the “American moon landings were fake” conspiracy conjecture is wrong. Because the moon landing project involves too many people and the project is too complicated to realize this plot.

This conclusion forms the basis of what follows.

What do some software claims about security say?

The above mentioned searchable encryption technology can not be implemented, some software manufacturers (including some big companies) provide search function, but still claim that they encrypt user data, it is safe, what are they talking about?

First, they may be talking about hard drive encryption, not software layer encryption (only against hard drive theft risk, not against data leakage risk, internal risk); Second, they may be talking about encryption of the transmission process, such as HTTPS or private communication encryption protocol; Third, they may have perfect internal management processes to control internal and external risks.

Based on the premise of the above conspiracy theory, the security of user data can be guaranteed even when the technology is not available, as long as some procedures are added to be open and transparent and the cost of damage is increased. For example:

  • Database operation log review: other people check whether DBA database operation is compliant, whether to secretly view a user data, etc
  • Multiple passwords: Multiple passwords can be used to decrypt data, which improves operation complexity, such as multiple passwords
  • Open and transparent process: confidential operation process records, such as xx needs to decrypt data due to development and testing

Having a good process also makes it easier for users to store private data.

The future of encryption: No encryption

Client side encryption not mentioned above is put here. Encryption point closer to the user, the security, if the user of the data is encrypted (not HTTPS encryption transmission), and to control the password, so service providers also do no further encryption processing, without encryption means the title mentioned this, service providers don’t have to spend a lot of energy and cost in data security, Give ownership of data to users.

One of the techniques required for client-side encryption above is full homomorphic encryption, which is an important topic in the field of cryptography. In September 2009, IBM’S Dr. Craig Gentry published a paper: Fully Homomorphic Encryption Using Ideal Lattices is proposed to solve a difficult problem in cryptography. Homomorphic encryption can be simply understood as follows:


f ( d a t a ) = D E (   f (   E ( d a t a )   ) ) f(data) = DE(\ f(\ E(data)\ ))

Where F is any operation function, E is encryption function, DE is decryption function, that is to say, any calculation operation on ciphertext is equivalent to the same operation on plaintext.

This is very cow force !!!!!!!!!!! I have some financial data, for example, carried out a statistical analysis on the need to third parties, but I don’t want to give them the data directly, they told me can provide fully homomorphic encryption services, then I can give them the data before the data encryption again, they give me after statistical computing results, I decryption related results can, only I can see real data.

All these features of homomorphic encryption can well solve the data security, the problem of trust, Craig king John terry gives an implementation, after a lot of password scholars also other implementation methods are given, but the current perspective, the technology is not yet mature, such as a key is 100 MB, unable to actually use for the present network environment.

conclusion

There is no absolute security, only relative security. All homomorphic encryption technology is expected to achieve a breakthrough in the commercial field.

Finally, amway is a free Markdown note-taker, coder, developer’s best friend, WYSIWYG editor, welcome to experience