This is the fourth installment of the Python Learning Go series. In this installment, we will take a look at advanced data structures in Go. Other articles in this series can be viewed at the beginning of “Learning Python by Comparison Go,” where we begin today’s sharing.

As mentioned earlier, Go and Python’s data structures can be divided into arrays and hash structures. In this article we will look at the types of hash structures.

Hash structure

A hash table is an extension of an array. It uses a hash function to map the key value of an element to the index of the array, and then stores the data at the corresponding index position in the array. When we query an element by key, we use the same hash function to convert the key to the array index, fetching the data from the corresponding array index position. By using hash functions, we can locate data directly, like array subscripts, in order (1) time.

In hash structure, the two most important knowledge points are “hash function construction” and “hash conflict resolution”.

The construction of hash function directly affects the performance of data structure. If hash keys are evenly distributed, the occurrence of hash conflicts will be reduced. Hash conflicts are inevitable in hash structures. There are two main methods to solve hash conflicts, which are “Open Addressing” and “chaining”.

Develop addressing methods that use algorithms to find the next empty array location. The list method adds extra space to the current key’s array location in the form of a linked list.

For more information on hashing, please refer to my notes on hash data structures and algorithms – hash tables.

With the basics of hash structures listed above, let’s take a look at how Go and Python hash structures look.

Go

The hash structure of Go language is map structure. According to the source code analysis of MAP, the underlying structure of Map is roughly as follows:

The outermost layer is an HMAP structure, and a [] BMAP array is used to store the ADDRESS of Bmap, which is used to store data. Each Bmap can store up to 8 KV pairs. In addition, there is an overflow to store the address of the last Bmap. The oldbuckets array address is used to store the bmap address that has not been moved to the new buckets array during capacity expansion. Mapextra is used to store non-pointer data for optimized storage and access.

The expansion of MAP memory is mainly the expansion of [] BMAP array. As the data increases, more bmaps will be attached to the [] Bmap array. The more Bmaps there are, most of the time will be spent searching the linked list. There is a standard that triggers the expansion of the [] BMAP array when the loading factor (number of elements filled in the table/length of the hash table) is greater than 6.5, usually twice the size of the source array. After capacity expansion, data will not be migrated immediately, but the old [] BMAP array will be hung on oleBuckets until new updates or insertions are made.

According to our analysis of the memory structure of Go Map, combined with the knowledge of hash table, we can know that Go uses the “linked list method” to resolve hash conflicts. However, the nodes in the linked list are not values, but rather a bMAP structured block of storage, which reduces the number of object blocks on a single chain to facilitate memory management and GC operations.

In terms of hash function, low hash is used to determine BMAP, and high hash is used to determine whether there is a stored key, which improves the efficiency of hash comparison search.

In bMAP, the key-value pair is not stored. Instead, the keys that occupy relatively small space are stored in one block, and the values are stored in the same order. This takes advantage of memory alignment and saves space.

The design of Go Map speaks volumes about the extreme pursuit of performance, and I strongly recommend studying it.

Let’s take a look at some of the common operations of Go Map:


/ / initialization
// Use the make function
myMap := make(map[string]int)
fmt.Println(myMap) // map[]

// Use literals
myResume := map[string]string{"name": "DeanWu"."job": "SRE"}
fmt.Println(myResume)  // map[job:SRE name:DeanWu]

// Declare an empty map
//var myResume1 map[string]string
//myResume1["name"] = "DeanWu" //panic: assignment to entry in nil map
// Empty map, the system does not allocate memory, and can assign values.

// The type of the key can be either a built-in type or a structural type, as long as the value can be compared using the == operator
// Slice, function, and structure types that contain slices. These types, because of reference semantics, can be changed by other references and cannot be used as mapping keys.
//myMap1 := map[[]string]int{}
//fmt.Println(myMap1) // invalid map key type []string

// Update and assign keys
myResume["job"] = "web deployment"
fmt.Println(myResume)  // map[job:web deployment name:DeanWu]

// Get the value of a key
value, exists := myResume["name"]
if exists {
  fmt.Println(value) // DeanWu
}
value1 := myResume["name"]
ifvalue1 ! =""{
  fmt.Println(value1) // DeanWu
  // This is recommended, because even if the map does not have this key, it will return the corresponding zero value. Need to make the corresponding judgment according to the data type, not as unified as the above, convenient.
}

// Delete key-value pairs
delete(myResume, "job")
delete(myResume, "year")  // When the map does not have this key, nothing is executed.
fmt.Println(myResume)  // map[name:DeanWu]


Copy the code

Maps can also be nested.

/ / map nested
myNewResume := map[string]map[string]string{
  "name": {
    "first": "Dean"."last":"Wu",
  },
}
fmt.Println(myNewResume) // map[name:map[first:Dean last:Wu]]
Copy the code

Python

In Python, there are two types of hash structures: dictionaries and collections.

The dictionary

The underlying structure of the dictionary is as follows:

The outermost layer is PyDictObject, which defines the global control fields for some dictionaries. There is a PyDictKeysObject that defines some fields of the dictionary hash table. There are two arrays, Dk_indices [] and Dk_entries [], which are real arrays for storing data. Kv data is stored in dk_entries[] array. Dk_indices [] is used to store the indices of KV data in DK_enties array. Each KV data is stored in entry data structure as follows:

typedef struct {
    /* Cached hash code of me_key. */
    Py_hash_t me_hash;
    PyObject *me_key;
    PyObject *me_value; /* This field is only meaningful for combined tables */
} PyDictKeyEntry;
Copy the code

Me_hash The me_hash cache stores the hash value of the key to prevent double calculation of hash values. Me_key and me_value are the real data for key and value.

The minimum size of a dictionary is 8. Python uses a “double the size” strategy. According to the rule of thumb, hash table array, when the loading factor is 2/3, is a critical value of hash conflict. So, when the hash array Dk_entries is filled to 2/3, it expands.

To save memory, Python separates indexes and hash table arrays into DK_indices and Dk_entries. The former saves the index of the data, occupies a small space, and can apply for the space of all the elements. The latter can apply for only 2/3 of the original size. Because it expands after 2/3, which is available at DK_indices.

By analyzing the underlying structure of the Python dictionary, we can know that Python uses “open addressing” to resolve hash conflicts according to the knowledge of hash tables.

Common operations on Python dictionaries:

# initialization
myDict1 = dict()
myDict2 = {}   # Recommended use
print(myDict1, myDict2)  # {} {}

# assignment
myDict3 = {'name': 'Tim'.'age': 18}
myDict3['job'] = 'student'
print(myDict3)  # {'name': 'Tim', 'age': 18, 'job': 'student'}

# values
print(myDict3['name'])  # Tim
# print(myDict3['phone']) # KeyError: 'phone'
print(myDict3.get('phone'))  # None Without a key, the get method does not throw an error
print(myDict3.get('phone'.'136xxxxxxx'))  # 136XXXXxxx for no key, with default value

# remove
del[myDict3['job']]
print(myDict3)  # {'name': 'Tim', 'age': 18}

# Dictionaries provide rich built-in methods
# radiansdict.clear() removes all elements in the dictionary
# radiansdict.copy() returns a shallow copy of the dictionary, returning a reference to the original dictionary
# radiansdict.fromkeys() create a new dictionary with seq as the keys and val as the initial values for all keys in the dictionary
# radiansdict.get(key, default=None) returns the value of the specified key, or default if the value is not in the dictionary
Return true if the key is in the dict, false otherwise
# radiansdict.items() returns a traversable array of (key, value) tuples as a list
# radiansdict.keys() returns a list of all the keys in a dictionary
# radiansdict.setdefault(key, default=None) is similar to get(), but if the key does not exist in the dictionary, the key is added and the value is set to default
# radiansdict. Update (dict2) Update the dictionary dict2 key/value pair to the dict
# radiansdict.values() returns all the values in the dictionary in a list
# pop(key[,default]) delete the value of the dictionary given key, return the value of the deleted value. The key value must be given. Otherwise, the default value is returned.
# popitem() randomly returns and removes a pair of keys and values from the dictionary (usually removing the trailing pair).
Copy the code

A collection of

A set has a hash structure, just like a dictionary. According to Python3 source code, the general structure is as follows:

Collections are a lot simpler than dictionaries. An array of data is stored directly in PySetObject.

According to the analysis of the underlying data structure of the collection, it also uses the “development addressing method” to resolve hash conflicts.

Some common operations on collections:

# initialization
s1 = {'1'.'2'.'3'}  # not recommended, an error will be reported when there is a dictionary in the element
s2 = set(['1'.'4'.'5'])
print(s1)  # {' 3 ', '1', '2'}
print(s2)  # {' 3 ', '1', '2'}

# intersection
print(s1&s2)  # {} '1'
# and set
print(s1|s2)  # {'3', '5', '4', '2', '1'}
# difference set
print(s1 - s2)  # {' 3 ', '2'}
# Determine subsets and supersets
s2.issubset(s1)   Is s2 a subset of S1
s1.issuperset(s2)  # whether s1 is a superset of S2

# some built-in methods for collections
# set.add(obj) adds collection elements
# set.remove(obj) removes collection elements
# set.update(set) merge sets
# set.pop() randomly removes an element and returns it
Copy the code

Unique data structure

In addition to class arrays and hash structures, Go has some unique structures of its own.

Pointer to the Go –

The Go language has Pointers. A pointer holds the memory address of a variable. Type *T is a pointer to a value of type T. Its zero value is nil.

  • The ampersand produces a pointer to its object.
  • The * symbol indicates the underlying value to which the pointer points.

Unlike C, Go has no pointer operation.

i, j := 42.2701

p := &i         // point to i
fmt.Println(*p) // read i through the pointer
*p = 21         // set i through the pointer
fmt.Println(i)  // see the new value of i

p = &j         // point to j
*p = *p / 37   // divide j through the pointer
fmt.Println(j) // see the new value of j
Copy the code

There is no concept of Pointers in Python; addresses in memory are called “references”. Similar to the pointer here, but only in logical analysis, no syntax support.

Go – structure

In Go, a struct is a collection of fields that can be accessed through a structure pointer.

// Define struct, automatic name must start with uppercase, as public variable
type Vertex struct {
  X int
  Y int
}
/ / initialization
var ver Vertex
ver.X = 4  // You can use. To assign and access structures
fmt.Println(ver.X)  / / 4

// Can be accessed using Pointers
v := Vertex{1.2}
p := &v
p.X = 1e9
fmt.Println(v)  / / {1000000000}

Copy the code

A structure can be nested, and when nested, all the fields of the nested structure are inherited.

type NewVertex struct {
  Vertex
  Z int
}
var v1 NewVertex
v1.X = 12
v1.Z = 13
fmt.Println(v1.X) / / 12
fmt.Println(v1)  / / {0} {12} 13
Copy the code

Because of these features, and because the Go language has no concept of classes, constructs are used as “classes” in many Web frameworks.

conclusion

In this paper, I learned the advanced data structures of Go and Python. Both of them follow certain data structures at the bottom, but both have their own characteristics. Combine the characteristics of their own language, clever design. In short, no matter what kind of language, when we use it, we should not only understand the basic usage of the structure, but also understand its underlying logical structure, in order to avoid some inexplicable pit in use.