This is the 7th day of my participation in the August More Text Challenge
Hello, I’m Yue Chuang.
Recently, I was thinking about how to combine the crawler base to form a large-scale crawler practice, and I used pickle in the practice, which led to this article.
Bad definition
Some people can’t concentrate on the arcane definitions of majors. I’ll start with a short conversation to give you a quick overview of the differences between pickle and json libraries.
One of my classmates, Mi, came over to ask Yuechuang, saying: What is the purpose of JSON? What is the difference between pickle and JSON? My answer, I think, is pretty straightforward:
Well, I’m trying to put it this way: Python, who travels long distances on a high-speed train (various databases), bought a little crappy car (Pickle) for short trips.
Just in recent years, taxi-hailing software is hot, all the people running in the street are ** “Didi” ** (JSON), so it has a software (JSON module), when needed, also use didi (json), and then more and more people are using Didi (JSON).
I don’t know. What do you think of my answer?
I’m not going to get too technical, but you get the idea of a little connection between these two libraries. But just say so, certainly is not good, the next formal talk about it.
Json library
Let me tell you something that’s really close to the real world.
JSON (JavaScript Object Notation) is a lightweight data interchange format that is designed to represent everything in strings designed to communicate information over the Internet and for human reading (compared to some binary protocols). JSON is widely used on the Web today, and it is a skill point that every Python programmer should be familiar with.
Imagine a situation where you want to buy a certain number of shares from an exchange. So, you need to submit the stock code, direction (buy/sell), order type (market/price), price (if is the price list), number and a series of parameters, and these data, strings, integers, floating point Numbers, and even a Boolean variable, all mixed together not convenient exchange solution package.
So what to do?
In fact, we’re going to talk about JSON, which solves this scenario. You can think of it simply as two types of black boxes:
- One, you can put in all these bits of information, like a Python dictionary, and output a string;
- Second, by typing this string, you can output a Python dictionary containing the original information.
The specific code is as follows:
import json
params = {
'symbol': '123456'.'type': 'limit'.'price': 123.4.'amount': 23
}
params_str = json.dumps(params)
print('after json serialization')
print('type of params_str = {}, params_str = {}'.format(type(params_str), params))
original_params = json.loads(params_str)
print('after json deserialization')
print('type of original_params = {}, original_params = {}'.format(type(original_params), original_params))
# # # # # # # # # # output # # # # # # # # # #
after json serialization
type of params_str = <class 'str'>, params_str = {'symbol': '123456'.'type': 'limit'.'price': 123.4.'amount': 23}
after json deserialization
type of original_params = <class 'dict'>, original_params = {'symbol': '123456'.'type': 'limit'.'price': 123.4.'amount': 23}
Copy the code
In-code operations: dumps, loads
Is it easy?
But again, add error handling. Otherwise, the program will crash if you don’t catch json.loads(), even if it’s just sending an illegal string. “For example: similarly, when you register a user and pass in an invalid string, will your application crash and stop running? Obviously not.”
At this point, you might be thinking, what if I want to output a string to a file, or read a JSON string from a file?
Yes, you can still use open() and read()/write() mentioned above to read /write the string into memory and then encode/decode JSON, which is a bit of a hassle.
import json
params = {
'symbol': '123456'.'type': 'limit'.'price': 123.4.'amount': 23
}
with open('params.json'.'w') as fout:
params_str = json.dump(params, fout)
with open('params.json'.'r') as fin:
original_params = json.load(fin)
print('after json deserialization')
print('type of original_params = {}, original_params = {}'.format(type(original_params), original_params))
# # # # # # # # # # output # # # # # # # # # #
after json deserialization
type of original_params = <class 'dict'>, original_params = {'symbol': '123456'.'type': 'limit'.'price': 123.4.'amount': 23}
Copy the code
File flow operations include dump and load
This makes reading and writing a JSON string simple and clear. When developing a third-party application, you can use JSON to export the user’s profile to a file for automatic reading the next time the application starts. This is a mature practice now commonly used.
So is JSON the only option? Obviously not, it’s just one of the most convenient options for lightweight apps. As far as I know, Google has a similar tool called the Protocol Buffer, but of course Google has fully open-source this tool, so you can learn how to use it for yourself.
It has the advantage of producing optimized binaries and therefore better performance than JSON. But at the same time, the resulting binary sequence is not directly readable. It is widely used in many performance-critical systems such as TensorFlow.
With Json out of the way, let’s talk about pickling.
Pickled library
Next, we’ll talk about serialization and deserialization, so LET me share with you the definitions of both. The module Pickle implements the binary serialization and deserialization of a Python object structure.
- The process of converting a variable from memory into something that can be stored or transferred is called serialization. After serialization, the serialized contents can be written to disk or transferred over the network to another machine.
- Conversely, reading the contents of a variable back into memory from a serialized object is called deserialization, or unpickling.
That is, when a Python program continuously runs data objects such as strings, list dictionaries, or even custom classesPersistent storage, i.e.,It is stored on disk.Prevent running in memory, due to power failure and other circumstances loss of data.
This is where the Pickle module comes in. ** It converts objects into a format that can be transferred or stored. **Python’s pickle module implements basic data sequences and deserialization. Through the serialization operation of the pickle module, we can save the object information running in the program to a file for permanent storage. By deserializing the pickle module, we can create objects from the file that were saved by the previous program.
Comparison with JSON modules
As you can see, the pickle module and the pickle module look similar, but they are fundamentally different, namely:
- JSON is a text serialization format (it outputs Unicode text, although it is then encoded in UTF-8 most of the time), while pickle is a binary serialization format;
- JSON is literally-readable, while pickles are not (an analogy to base64’s unreadability);
- JSON is interoperable and widely used outside Python systems, while pickle is Python-specific;
Serialization and deserialization
Read/write the target storage file in binary mode and serialize the data object with dump and load:
import pickle
D = {
'name': 'bob'.'major': {
'english'.'math'
},
'd': [1.2.3.4.5.6.7]}with open('D.pik'.'wb') as f:
pickle.dump(D, f)
with open('D.pik'.'rb') as f:
D = pickle.load(f)
print(type(D))
print(D)
Copy the code
Example results:
<class 'dict'>
{'name': 'bob'.'major': {'english'.'math'}, 'd': [1.2.3.4.5.6.7]}
Copy the code
Of course we can serialize to memory (string format), then the object can be processed in any way such as over the network:
pik = pickle.dumps(D)
print(pik)
D = pickle.loads(pik)
print(type(D))
print(D)
Copy the code
Example results:
b'\x80\x04\x95E\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\x03bob\x94\x8c\x05major\x94\x8f\x94(\x8c\x07englis h\x94\x8c\x04math\x94\x90\x8c\x01d\x94]\x94(K\x01K\x02K\x03K\x04K\x05K\x06K\x07eu.'
<class 'dict'>
{'name': 'bob'.'major': {'english'.'math'}, 'd': [1.2.3.4.5.6.7]}
Copy the code
cPickle
The cPickle package has almost exactly the same functionality and usage as the pickle package (where it differs, it is actually rarely used). The difference is that cPickle is written in C and has better performance, which is recommended for most applications. For the example above, if we wanted to use the cPickle package, we could have changed the import statement to import cPickle as pickle.
AI Yue Chuang: V: Jiabcdefh