Earlier in the interview, the interviewer asked Python if the big data types were in order. I said that dictionaries have gone from unordered to ordered since Python 3.6, because I remember seeing it. But the interviewer repeatedly asked me to go on to see, because the principle is not particularly thorough understanding, and failed to persuade people. So I came down to understand the principle and write it down.Copy the code

Dictionary changes after python3.6

Before Python3.5 (inclusive), dictionaries were not guaranteed to be in order, with key-value pairs A inserted first and key-value pairs B inserted later, but when you print the list of Keys in A dictionary, you may find that B comes before A.

But since Python3.6, dictionaries have become ordered. Insert the dictionary first for key pair A and then for key pair B. When printing the list of Keys in the dictionary, you will find that B comes after A.

Not only that, starting with Python3.6, the following three traversals are more efficient than before Python3.5:

for key in dict
for value in dict.values()
for key, value in dict.items()
Copy the code

And since Python3.6, dictionaries occupy only 30% to 95% of memory space, depending on the number of key-value pairs in the dictionary.

The underlying principles of dictionaries before Python3.5 (inclusive)

Dictionary insertion

When we initialize an empty dictionary, the bottom layer of CPython initializes a two-dimensional array with eight rows and three columns as follows:

My_dict = {} "' the memory map [[-, -, -], [-, -, -], [-, -, -], [-, -, -], [-, -, -], [-- -- -- - [--], [--], [--]] ""Copy the code

Now, let’s add a number to the dictionary:

Kingname my_dict [' name '] = ' ' ' ' 'the memory map [[-, -, -], [-, -, -], [-, -, -], [-, -, -], [-- -- -- - -- -- -- -- -- -], [1278649844881305901, pointer to the name, pointer to kingname], [-, -, -], [-, -, -]] "'Copy the code

So why does memory look like this when you add a key-value pair? First we call Python’s hash function to calculate the name string’s hash value at the current run time (which is guaranteed to change from run to run, but can change when you close Python and open it again) :

>>> hash('name')
1278649844881305901
Copy the code

Suppose the hash(‘name’) value is 1278649844881305901 at a run time. Now we’re going to take the remainder of this with respect to 8, and it has a remainder of 5. So put it on the line 5 of the two-dimensional array we just initialized. Since name and kingname are two strings, the underlying C language will use two string variables to store these two values and get their corresponding Pointers. So, on the 5 line of our two-dimensional array, the first value is the hash value of name, the second value is the address of the memory in which the name string is located (Pointers are memory addresses), and the third value is the address of the memory in which the kingname string is located.

Each row has three columns, and each column occupies 8 bytes of memory space, so each row occupies 24 bytes of memory space.

Dictionary reading

Reads the value of the specified key
My_dict ['age'] = 26 my_dict['salary'] = 99999 [[-4234469173262486640, salary pointer, 999999 pointer] [1545085610920597121, the implementation of the age of a pointer, pointer to 26], [-, -, -], [-, -, -], [-, -, -], [1278649844881305901, Pointer to the name, pointer to kingname], [-, -, -], [-, -, -]]Copy the code

Suppose we want to read the value of age. So Python evaluates the Hash value of age for the current run, and then takes the remainder of the Hash value. If the remainder is 1, then the row with subscript 1 in the two-dimensional array is the key-value pair that we need. Returns the value in memory corresponding to the third pointer on the line, 26.

Iterate over the dictionary Key

The Python underlayer iterates through the two-dimensional array, returning the memory value of the Key pointer if the current row has an array. If not, skip it. So it’s always going to go through every line of the entire array.

Since the remainder of the hash value can be large or small, dictionary keys are not stored in the order in which they were inserted.

Python3.6, the underlying principles of dictionaries

Dictionary insertion

After Python3.6, the underlying data structure of the dictionary has changed, and now when you initialize an empty dictionary, the underlying data structure looks like this:

My_dict = {} "" Memory schematics at this time = [None, None, None, None, None, None] entries = []"Copy the code

Python generates a single one-dimensional array of length 8. It then generates an empty two-dimensional array. Now let’s add a key-value pair to the dictionary:

My_dict ['name'] = 'kingname' ' ''' My_dict ['name'] = 'kingname' ' ''' None] entries = [[-5954193068542476671, pointer to name, kingName pointer]] ""Copy the code

So why is memory the way it is?

First, we get the current runtime hash value of ‘name’ as -5954193068542476671, which has a mod of 1. So let’s change the index 1 in the indices one-dimensional array to 0. The 0 represents the index of the two-dimensional array entries.

In the old way, when a two-dimensional array had eight rows, it still took up 8*24=192bytes, even though the valid data had only three rows. With the new method, if there are only three lines of valid data, then entries are only three lines and occupy a space of 3*24=72bytes. The indices, because they are only a one-dimensional array, occupy only 8bytes, so they occupy a total of 80bytes. Memory is 41% of what it used to be.

Dictionary reading

Reads the value of the specified key
My_dict ['address'] = 'XXX' my_dict['salary'] = 999999 my_dict['salary'] = 999999 my_dict['salary'] = [1, 0, None, None None] entries = [[-5954193068542476671, pointer to name, kingName], [9043074951938101872, pointer to address, [7324055671294268046, salary pointer, 999999]]"Copy the code

Assuming I want to read the value of salary, I first compute the hash value of salary and the remainder of the hash value against 8, which has a remainder of 6. And then I’m going to read the indices 6 value, which is 2. Then read the row of data with subscript 2 in entries, that is, salary data.

Through the dictionary

The new way, when I want to insert new data, I always add data to the end of entries so that the insertion order is maintained. When we iterate through dictionary Keys and Values, we simply iterate through entries. Every line inside is useful data, there is no skip situation, reduce the number of traversals.

conclusion

Before 3.5

After 3.6