1. Cryptography

Origin: Ancient wars

  • In ancient times, a messenger was asked to ride a horse to carry messages. They often worry about the Courier getting caught.

Shift encryption: cipher rod

  • An earlier form of encryption

  • A piece of cloth is wrapped around the cipher rod, with the same size cipher rod on both sides

Substitution encryption

  • You can also use a stopwatch

2. Modern cryptography

  • It can be used not only for textual content, but also for a variety of binary data.

Symmetric encryption:

  • It’s very much like substitution encryption

Principle:

  • Using key and encryption algorithm to transform data, the meaningless data obtained is ciphertext.
  • Using key and decryption algorithm to reverse the ciphertext, the original data is obtained.

Process:

  • The original data, using encryption algorithms and keys, changed it into unreadable ciphertext,
  • At this point the other party has the cipher text, and he has the key, no one else has. And then he can decrypt it,
  • He uses the decryption algorithm and the key, and he can decrypt it and get the original data.

  • This is useful for computers. Why?

    • Because when we communicate, our network is totally untrustworthy.
    • There are probably many, many intermediate nodes on the way from my house to yours, and they want our intermediate data easily. Why is that?
      • When we send data online, let’s say I send you a message, it’s not a message that goes down the same road, it’s a propaganda message, it’s a radiation message.
  • How is it different from traditional substitution encryption?

    • He can encrypt binary files

Classical algorithm:

  • DES (deprecated because the key was too short)
  • AES

Why are short keys discarded?

  • Because a short key could be cracked.

What does that mean? What is the decryption of symmetric encryption key?

  • Well, the decrypting of our communications, our ancient war letters, I know what your code stick is, I know what your code watch is, and when I get your encrypted letter, I can restore it to you. So I get your cipher text or code watch, and I break it.
  • His key? You can get a pair of original text and ciphertext, and then you use the key that you prepared to decrypt the ciphertext, and it actually restores to the original text, which means the decryption is successful.

Again, there is a crack and there is an anti-crack.

  • The idea of anti-cracking is very straightforward, which is that I made your method of cracking more complicated.
  • The best encryption algorithm is that his key can only be cracked by brute force.
  • You can’t let someone can’t crack, but the cost is very high, like a thousand years, 10,000 years to crack, then I think you can’t crack this thing.

Asymmetric encryption:

Principle:

  • Using the public key to encrypt the data to get the ciphertext;
  • The metadata is obtained by decrypting the data using the private key.

Classical algorithm:

  • RSA (encryption and decryption, signature verification can be)
  • DSA (specially designed for signature, signature and verification process is very fast, has the advantage of speed, the elliptic curve is called ECDSA)

  • How did you do that?
    • I’m using math. I can explain it briefly, but it’s not enough to fully explain asymmetric encryption. Because the simplest algorithms for asymmetric encryption are complicated. RSA algorithm is a relatively simple algorithm, but it is still difficult to understand. So LET me give you a simple example. This example may be flawed, but you need to know what she means.
      • For example, if we communicate with each other,
      • We only have 10 messages, 0-9, only 10.
      • Like I’m sending you a message 110 right now, but if I get intercepted, I’m screwed. So I’m going to do a little transformation on her.
      • And then what do I do? What’s my algorithm? What are my encryption keys and decryption keys? This is the time to reflect his usefulness.
      • My encryption algorithm is to do an addition for each of my characters. And the exact number of the addition, that’s my key.
      • So I specify, my encryption key is 4, my decryption key is 6, encryption: 110–>554, send.
      • Next, I want to decrypt, my decrypt algorithm is still add, but add 6,554 –>110, restore.
    • The example is flawed, but she explains how asymmetric encryption works. It’s about overflow, and if you can’t overflow, you can’t play with asymmetric encryption. You all overflow, delete the point in front of him, which is a key point in asymmetric encryption.

  • Symmetric encryption. Isn’t that safe? Why asymmetric encryption?
    • There is a very important reason, is how you give the key to each other.
    • Asymmetric encryption is one of his most important benefits. I can transfer the key directly over the Internet, without any security risks.
    • Let’s see how there’s no safety hazard.
      • First, I give my encryption key from A to B, and B gives her encryption key from B to A.
      • First, I see whether A and B can communicate. For example, IF A wants to send a message to B, I use B’s encryption key to encrypt my original data. After the encryption is done, it’s the ciphertext,
      • B take the decryption key to decrypt, get the original data. The same thing happens when B sends a message to A.
      • C won’t be able to read it when he gets the ciphertext, but if he doesn’t just get the ciphertext, he also gets the two encryption keys that we sent during transmission. Can he read our cipher text now?
      • Now I have a ciphertext, which is sent by A to B. He encrypted it with B’s encryption key. Which key do you think C should use to solve it? All not line.
      • One of the key things about asymmetric encryption is that it doesn’t use the same key for encryption and decryption. So it’s no use taking my two encryption keys at this point.
      • The decryption key has to be immovable in the hand. That’s the key. Be sure to take it in hand, no one can give it, it can’t be sent out.
      • And the encryption key is just out there. In fact, the encryption key that you publish is called the public key, and the decryption key is called the private key

Extension of asymmetric encryption: digital signature

Can a public key unlock a private key?

  • First of all, the private key can solve the public key, then the public key can solve the private key? Yes.
  • One thing to note, however, is that public and private keys cannot be reversed. Because a lot of times the public key can be calculated from the private key, there is an asymmetric encryption algorithm called the elliptic curve algorithm, which is used in bitcoin. His public key is calculated from the private key, and when you get the private key, you have both. Even though they’re equivalent in terms of data conversion, a lot of times your public key can be calculated by someone, so it doesn’t matter if you publish a public key, it’s supposed to be public, as long as the private key isn’t public and you haven’t figured out my private key.
  • There is another case is RSA, RSA his public key even only one, your home public key is this value, my home public key is this value, also is not the public key is a value, is a part of the public key, the most critical part of the public key is the same. I think it’s 65537. I remember. So public and private keys are not interchangeable. But the encryption is exactly the same. You can solve me, I can solve you.
  • Because of this feature, I can do the signature and verification.

Signature and authentication:

  • Now there is a more common usage is to carry on the digital signature on the Internet, carry on the electronic signature. Is that I have one of my keys. I have my data, let’s say I claim I owe $100 to a small bank, I can write it by hand, I can use electronic signature. How does he sign his name? Is to use one of my key, I wrote the article encryption algorithm transformation, after others get, and then use another public key decryption, a look, can be replaced into the original data, you can prove that this thing is really written by me.

  • In addition, there will be such a step, along with the raw data
  • It is very inconvenient for you to only hold the signature data. Every time you look at it, you have to decrypt it with the public key to see the original data. Therefore, the original data and signature data are usually sent together. At this point you can look at the original data can look, you want to verify that this data can be restored with signature data.

3.Base64

define

  • An encoding algorithm that converts binary data into a 64-character string. (a-z, a-z, 0-9, +, /)
  • What is binary data?
    • Non-text data is binary data, and in a broad sense, all computer data is binary data. Because electrical signals are only 1010 bits, computer data is only two bits, computer data is binary data, whatever text documents, movies, Word documents, pictures these are binary data.
    • And there is a more special called text data, is plain text, such as strings, such as your TXT document stored in some text, these are called text data or called character data. In addition to text data, we usually communicate with what we call binary data, which is the narrow sense of binary data. So there are two kinds of data, textual data and binary data.

Base64 code table:

2 to the sixth power is 64. Base64 is 6 bits

  • How did he switch?
    • He’s cutting your data. When we do data, don’t we have 8 bits per byte? One bit is a 0 or a 1, you can only choose one, and eight bits are a byte, but base64, in order to make your data a string, it cuts you into six bits and puts them all together. The actual transformation is this: Man to base64 is TWFu.

Base64 will make the data bigger after the conversion.

  • How to make your data base not grow after conversion?
    • Then don’t use Base64, use Base256, but why was base64 created? Just because there aren’t 256 of those common characters, what base64 would be needed if there were 256 characters. What is the purpose of Base64? Its purpose is to convert your binary (narrow) data into text data, and then into strings.

USES:

  • Let the original data have the characteristics of a string, such as can be transferred in the URL, can be saved to a text file, can be transmitted through ordinary chat software text. (Convert a non-string to a string)
  • The character string that can be read by human eyes can be changed into the character string that cannot be read by human eyes to reduce the risk of peeping.

When will you need Base64?

  • For an early example, email was invented, and you couldn’t send pictures. But computers can store pictures, so how can I send pictures to you? Ok, I’m going to do base64, it’s getting bigger, it doesn’t matter, I’m going to do it slowly. And then use the base64 decoding algorithm to solve it for him. So we send the picture as text.
  • For another example, what if we build a new chat software, but we have limited technology, can not support pictures, can not transmit pictures? It doesn’t matter, convert the image to base64, and THEN I’ll send you the text data of Base64, and then you can solve it locally, and then you’ll have that image. So what is this? It’s just that sometimes it has limitations, it can’t do that, it’s the old email, it doesn’t have this function, it can’t send pictures, and I want to send pictures. Okay, I’m just going to do a base64 spin.

Can Base64 encrypt images to be transmitted more efficiently and securely?

  • Isn’t. Security can only be secured by encryption, base64 is not encryption;
  • Efficient. Base64 gets bigger, 1/3 longer. Is that efficient?
    • Whether you’re storing it, transferring it, or reading it, it’s going to be slower, and it’s going to take up your bandwidth, so you might be sending this thing, and when it’s done, you’re sending something else, and it’s going to be done a lot sooner, but because it’s base64 going through it, oh, shit, you’re going to need a third longer transfer time. There is absolutely nothing efficient about Base64. Instead, it is inefficient. If you can’t do base64, you don’t want base64, and this thing is toxic, it gets longer every time you make it.

Base58

  • There is a variant of base64, base58, which removes 0 and capital O, capital I and lowercase L, plus two +’s and/’s. He used it on the addresses of Bitcoins or other virtual currencies. What’s so special about this address? He could be copied by hand. The + and/are removed to facilitate double click replication.

4.URL encoding

  • The encoding of url addresses is another variant of Base64
  • Url encoding is also base64, but it is slightly different:
    • The reserved characters in the URL are encoded using the percent sign “%”

Example: to convert the &in a reserved character to %26:

  • Purpose: To disambiguate and avoid mistakes
  • And it is used for Chinese display

The display is in Chinese, but the copy is gone

https://www.google.com/search?q=%E6%A4%AD%E5%9C%86%E6%9B%B2%E7%BA%BF%E7%AE%97%E6%B3%95&oq=%E6%A4%AD%E5%9C%86%E6%9B%B2%E7%BA%BF %E7%AE%97%E6%B3%95&aqs=chrome.. 69 i57j0l2. 7739 j0j7 & sourceid = chrome&ie = utf-8
Copy the code

5. Compress and decompress

  • Compression: To store data in a different way to reduce storage space
  • Decompression: Restore the compressed data to its original form for use

Common compression algorithms:

  • DEFLATE (compression algorithm for ZIP),
  • JPEG,
  • MP3

Does compression belong to coding?

  • What exactly is coding? Convert format A to format B, and format B can be converted back, no loss of information, no increase of information. Compression is also a kind of coding.

6. Codec of media data

  • What is media data?

    • Pictures, video, audio, things like that, what’s their encoding?
    • Their codec is to convert the original data into an encoding that can be stored. For example, the image is to file the image data.
  • For example, how do you code images?

    • So let’s say I have an image, and this image is 64×64, and it’s a pure white image. So how do I write it, how do I save it? It’s probably a bitmap in memory, and when I want to code it out and save it, how do I save it?
      • FFFFFF is a white dot
      • ffffffffffff…… X6 64
      • ffffffffffff…… X6 64
      • ffffffffffff…… X6 64
      • . Line 64
    • This is the encoded image, but it’s so annoying, it’s so big. We can compress, compress at the same time as we encode, how do we compress? There are various compression methods. If we’re really going to do compression, what are DEFLATE and all the other algorithms essentially?
  • For example: aaaaaaaaaaaaaaaaa… aaaaaaaaaaaaaaaaaaabbbbbbbb… bbbbbb

    • After some rough compression: text:a=100; b=20
    • The picture above also works:
      • image:64*64;
      • FFFFFF = [0, 0] – [63 tobacco]
    • The same is true of audio and video compression. The above is just an example. The compression is not rigorous, and good compression algorithms will never make the file bigger.
  • Lossy and lossless compression:

    • I can make her pixel smaller, and I can change the color number to a little less, so that my data will be lost after crushing.

7. The serialization

  • I have a data in Java memory, and it has several properties

  • This is in memory, I need to get this stuff out, one is on our network, the other could be on our phone or local storage, but our memory is all messed up, how do we store all these formats? I’m going to take it out and make it linear, a form that can be resolved generally. This is serialization.
  • I can serialize things like the one above into JSON format. If I serialize it, I can choose json, json is just an option for serialization, you can serialize it to JSON, XML… Anyway, you just have to turn it into something that can be stored, that can be transmitted, that’s linear.

  • Serialization:
    • The process of converting a data object (typically in memory, such as in the JVM) into a sequence of bytes
  • Deserialization:
    • Converts a sequence of bytes back into an object in memory.
  • Purpose: to communicate with the outside world

Does serialization belong to encoding?

  • Serialization is not strictly an encoding, because serializing his prototype is in memory, what’s the encoding a to B, the encoding is two things that are already formed his format conversion, but serializing his prototype is not a format, it’s a lump in memory. It’s just not code in the strict sense, but there’s no strict rule about code.

8.Hash

Definition:

  • Converts any data to a specified size range (usually small)
  • For example, we have 200 students, we make each student a number, 001,003…. This process is a hash process, and each person’s number is a hash value.

Function:

  • Used for abstract, digital fingerprint

  • Such as:

    • Hash the length of the string
      • “haha”–>4
      • “pa”–>2
      • And then the number is the hash value, which is a bad hash because
      • “hehe”–>4
    • Hash is going to have this requirement of identity, so hash is going to have a very small collision rate, so study what your hash algorithm is going to do, it could give you very large data, it could give you very small data. You need to get the results quickly and not bump into each other.

Classical algorithm:

  • MD5 (which has been largely abandoned as an anti-hack because it’s so easy to crack),
  • SHA1
  • SHA256

Practical use:

Data integrity verification:

  • For example, if we’re going to install the next package, the publisher might give you a hash value for authentication below, and he’ll tell you whether it’s a SHA1 value or a SHA256 value or an MD5 value, and he’ll tell you, and he might also give you multiple values for authentication, so what does that do? The uploaders have a source file, which has 5 GIGABytes, and its MD5 value is 7788. After you download it, your download file may have been damaged, maybe your download tool made a mistake during the download, or someone modified your network. In short, after you download it, this file may be damaged. You can also hash the downloaded file. For example, you can hash the MD5 of the file. If it is also 7788, you can prove that the downloaded file is complete. If you calculate 2567, then you failed to download the file.
  • A hash is like extracting eigenvalues from a bunch of your data. If this is the result of multiple calculations, then it can be treated as a fingerprint.

Quick lookup:

HashCode () and a HashMap

  • When hash is used in Java, override the equals method as well as the hashCode method

  • Hashcode is used to quickly determine whether or not it is equal (prejudge)

Privacy protection:

  • There was a website a couple of years ago that got ripped off,
    • Techno’s blog site was stripped down, and after it was stripped down a lot of people said they were using plaintext storage, and that’s why users’ privacy was compromised. What is the reason for this?
    • First of all, the theft of user data is easy to understand. If it is stored in plain text, the thief will keep trying your user name and password on other sites, and a lot of user information will be stolen.
    • So what is nonplaintext storage? After the server receives the user’s account and password, it converts the password with MD5 and stores it (the password will be encrypted with MD5 and converted once for the next login and then compared with the password stored locally). In this way, after being stolen, he cannot log in to other websites.
  • In the meantime, there’s something called adding salt, what does adding salt mean?
    • The hackers are getting better. Where do we think is harder to crack? Because hash is not reversible,
    • One, you can not get this thing directly in the website login verification, because in a turn is another value;
    • Two, this thing still can’t work out his old code backwards. But those hackers have more time, they are used to save the password one by one and then map, when they steal your library, reverse, for example, the password is 123456, md5 conversion is DDDDD, the hackers themselves make an MD5 mapping table, DDDDD corresponds to the original password is 123456, This brute force mapping table, called the rainbow table, can be used to partially crack this hashing method of storing passwords.
    • So, you can break this thing by adding salt. What is added salt, each website itself defines a salt, the salt is your strict secret, will not be taken away by others, taking off the pants when the salt is also very sad. What is salt? Salt is when you do MD5 or SHA1, instead of using 123456 to hash, you add 333, 123456333. 333 is your salt, and its MD5 value is completely different. Each site has different salt, which makes the rainbow table invalid. Typically, salt is not 333, but a long list of unconventional characters.

Is Hash encoding?

  • No, what’s the code? You make it up, you don’t lose any data. What is a hash? Hash is extracting your feature, it can’t reverse.

  • Is Hash encryption? It is said that MD5 is irreversible encryption?

    • Not encryption, Baidu encyclopedia is wrong, he is an irreversible transformation.
    • And someone invented an irreversible encryption, which makes sense. Encryption means to change A format to B format, others can not understand, and can be restored. But you just want to twist the two words encryption into other meaning, he means that others can not understand, then I admit MD5 is a kind of irreversible encryption, but encryption this word has his own definition, the word can not be said nonsense.

9.Hash and asymmetric encryption

  • This is the previous graph, this graph has a disadvantage, its signature data and the original data is the same size, because the signature data can be restored. It doesn’t matter if your file is small, a 10GB video, his signature is 10GB, think about it is very bloated ah. In fact, what they do is they hash the data, they sign it, they sign that hash value.

The character set

  • Meaning: a Map from integers to literal symbols in the real world

Branches:

ASCII:

  • 128 characters, one byte (256 characters can be used, but ASCII uses only 128 of them)

ISO – 8859-1:

  • ASCII is augmented by one byte

Unicode:

  • 130,000 characters, multiple bytes
    • Utf-8: Unicode encoding branch
    • Utf-16: Unicode encoding branch
  • What is code branching?
    • For example, I have three words: Chinese
      • Where “middle” may correspond to 00000001,
      • The value of Country is 00001111.
      • The word “person” corresponds to 11111111
    • This is their corresponding code, but we might not actually write it that way
      • 01 in
      • The 001111
      • 11111111 men
      • It’s a little bit shorter. This might be UTF-8
    • Another way, maybe
      • In 0001
      • The 1111
      • 11111111 men
      • This could be UTF-16
    • They may have the same character set, but they may not have the same encoding, so this is a different encoding. For example, utF-8 and UTF-16 have a distinct difference. Utf-16 is a 16-bit character, so there is no such thing as a short character.

GBK

  • GBK/GBK2312/GBK18030: Chinese self-developed standard, multi-byte, character set + encoding