preface
The front end of the long road, is still no way back, this paper mainly summarizes the meaning of the list of data structure, characteristics, use scenarios and the extension of some design ideas, from shallow to deep exploration, familiar with the list.
What is a linked list
- Like arrays, a linked list is a linear list.
- From the perspective of memory structure, the memory structure of the linked list is a discontinuous memory space, and it is a data structure that connects a group of scattered memory blocks to carry out data storage.
- Each block of memory in the list is called a Node. Besides storing data, nodes also need to record the address of the next node in the chain, that is, the next pointer next.
Features of linked lists
- Insert, delete data efficient O(1) level (just change the pointer to).
- Random access is low O(n) level (traversal is required from head to tail of the chain).
- Arrays are more memory intensive because each node that stores data needs additional space to store subsequent Pointers.
Common linked list structure
Single linked list, two-way linked list, circular linked list and two-way circular linked list.
Singly linked lists
Features:
- Each node contains only one pointer, the successor pointer.
- A singly linked list has two special nodes, the first and the last. Why special? The first node address is used to represent the whole list, and the subsequent pointer of the last node points to null address.
- Performance features: The time complexity of inserting and deleting nodes is O(1), and the time complexity of searching nodes is O(n).
Two-way linked list
Features:
-
In addition to storing data, a node has two Pointers to the address of the previous node (prev) and the address of the next node (next).
-
The prev pointer of the first node and the successor pointer of the tail node both point to an empty address.
-
Performance characteristics:
Compared to a single linked list, it takes more storage space to store the same data.
Insert and delete operations are O(1) more efficient than single linked lists. The following uses the deletion operation as an example. There are two types of deletion operations:
The node whose value is equal to a given value and the node to which the given pointer points. In the former case, both the single and bidirectional linked lists need to be traversed from beginning to end to find the corresponding node for deletion, with a time complexity of O(n).
In the second case, the deletion operation must find the precursor node. The single linked list needs to iterate from beginning to end until p->next = q, with a time complexity of O(n), while the bidirectional linked list can directly find the precursor node, with a time complexity of O(1).
For an ordered linked list, the efficiency of querying by value of two-way linked list is higher than that of single linked list. Since we can record the position P we searched last time, we can decide whether to search forward or backward according to the relationship between the value to be searched and the size of P in each query, so we only need to search half of the data on average.
Circular linked list
Features:
-
It is the same as the singly linked list except that the tail node’s successor pointer points to the address of the first node.
-
Suitable for storing data with cyclic characteristics, such as the Joseph problem.
Two-way circular list
Features:
- The precursor pointer of the first node points to the tail node, and the subsequent pointer of the tail node points to the first node.
Array or list?
-
Time complexity of inserts, deletes, and random access
- Array: The time complexity of insertion and deletion is O(n), and the time complexity of random access is O(1).
- Linked list: the time complexity of insertion and deletion is O(1), and the time complexity of random access is O(n).
-
An array of drawbacks
- If the memory space to be applied for is large, for example, 100 MBIT/s, but there is no contiguous memory space of 100 mbit/s, the application fails, even though the available memory space exceeds 100 mbit/s.
- The storage space is fixed. If the storage space is insufficient, you need to expand the storage space. Once the storage space is expanded, data replication is required, which takes a lot of time.
-
List disadvantages
- The memory space consumption is greater because additional space is required to store the pointer information.
- Frequent insertion and deletion of a linked list causes frequent memory allocation and release, which may easily cause memory fragmentation. In Java, it may also cause frequent GC (automatic garbage collector) operations.
In our actual development, the choice between arrays and lists would be a case-by-case tradeoff for different types of projects.
Usage scenarios of linked lists
Common CPU caches, database caches, browser caches, and so on.
The size of the cache is limited. When the cache is full, what data should be cleared and what data should be kept? This is determined by a cache elimination strategy.
There are three common policies: First In, First Out (FIFO), Least Frequently Used (LFU), and Least Recently Used (LRU).
Linked list implements LRU cache elimination policy
When the accessed data is not stored in the cached list, the data is inserted directly into the list head with a time complexity of O(1); When the accessed data exists in a stored linked list, the node corresponding to the data is inserted into the list head with a time complexity of O(n). If the cache is full, the cleanup starts at the end of the list with a time complexity of O(1).
Arrays implement LRU cache elimination policies
Method 1: The first position saves the latest data, and the last position is cleared first
When the accessed data does not exist in the cached array, the data is inserted directly into the first element of the array. At this time, all elements of the array need to be moved backward by 1 position, and the time complexity is O(n). When the accessed data exists in the cached array, the data is found and inserted into the first position of the array. At this time, the array element needs to be moved. The time complexity is O(n). When the cache is full, the trailing data is cleared with a time complexity of O(1).
Method 2: The first position is cleared first, and the last position is saved
When the accessed data does not exist in the cached array, add the data directly to the array as the most current element time complexity O(1); When the accessed data exists in the cached array, find the data and insert it into the last element of the current array, at this time also need to move the array element, the time complexity is O(n). When the cache is full, the first element of the array is cleared, and the rest of the array elements need to be moved forward by one bit. The time complexity is O(n). (Optimization: when cleaning, consider cleaning a certain number of times to reduce the number of cleaning and improve performance.)
Design idea
Space-time replacement idea: “Space for time” and “time for space”.
When memory space is sufficient, if we are more interested in code execution speed, we can choose relatively high space complexity, time complexity of relatively low algorithms and data structures, caching is an example of space for time.
If the memory is short, such as code running on the mobile phone or MCU, then, it is necessary to reverse the idea of time for space.
parsing
“Arrays are easy to use, implemented in continuous memory space, and can be preread by the CPU’s caching mechanism, so access is more efficient. Linked lists are not stored contiguously in memory, so they are not cache-friendly and cannot be preread effectively.”
What does the CPU caching mechanism mean here? Why is an array better?
When the CPU reads data from memory, it first loads the data into the CPU cache.
Instead of reading from memory at the specific address, the CPU reads one block of data each time (I’m not sure about the size). And save it to the CPU cache, and then the next time you try to access the data in memory, you start from the CPU cache, and if you find it, you don’t need to retrieve it from memory.
This enables a faster than memory access mechanism, which is what CPU caches are for: they are introduced to make up for the difference between slow memory access and fast CPU execution.
In the case of an array, the storage space is contiguous, so when one index is loaded, you can load subsequent subindexes into the CPU cache, which will perform faster than if the storage space is discontiguous.
reference
- The beauty of data structures and algorithms