May 18, 2020
preface
Let’s first discuss a classic linked list application scenario, which is the LRU cache elimination algorithm.
Cache is a technology to improve the performance of data reading. It is widely used in hardware design and software development, such as CPU cache, database cache, browser cache and so on.
The size of the cache is limited. When the cache is full, what data should be purged and what data should be retained? This requires a cache elimination strategy. There are three common FIFO policies: First In, First Out (FIFO), Least Frequently Used LFU (Least Recently Used LRU).
So, back to business, our opening question today is: How to implement LRU cache elimination with linked lists? With that question in mind, let’s get started!
Ollie give!
All kinds of linked list structures
A linked list is a slightly more complex data structure than an array. It’s also a little harder for beginners to grasp than arrays. These two very basic, very common data structures will often be compared side by side. So let’s see what the difference is.
Let’s look at the underlying storage structure first.
I drew a picture just to visualize it. As we can see from the figure, arrays need a contiguous memory space to store, which is relatively high memory requirements. If we apply for a 100MB array, the application will fail even if the total free space of memory is greater than 100MB if there is no contiguous enough storage space in memory.
Linked lists, on the other hand, do not require a contiguous memory space. Instead, they use “Pointers” to connect a set of discrete memory blocks in series, so if we had requested a 100MB linked list, we would have no problem.
There are many kinds of linked list structures. Today, I will focus on the three most common linked list structures. They are: single linked list, bidirectional linked list and circular linked list. Let’s start with the simplest and most commonly used single linked list.
As we said earlier, linked lists are a series of discrete blocks of memory connected by Pointers. We refer to memory blocks as “nodes” of a linked list. In order to chain all the nodes together, each node in the linked list needs to store data as well as the address of the next node in the chain. As shown, we call the pointer that records the address of the node next.
The opposite concept is nonlinear tables, such as binary trees, heaps, graphs, etc. It is called nonlinear because, in a nonlinear table, data are not simply contextual.
As you can see from my single linked list, there are two nodes that are special, the first node and the last node. We habitually call the first node the head node and the last node the tail node. The header is used to record the base address of the linked list. And with that, we can go through the whole list. The special thing about endpoints is that instead of pointing to the next node, the pointer points to an empty address, NULL, indicating that this is the last node on the list.
Like arrays, linked lists support lookup, insertion, and deletion of data.
As we know, in order to maintain the continuity of the memory data, a large amount of data movement is needed during the operation of array insertion and deletion, so the time complexity is O(n). To insert or delete data in a linked list, we do not need to move nodes in order to maintain the continuity of memory, because the storage space in a linked list is not contiguous. So, inserting and deleting a piece of data into a linked list is very fast.
For your convenience, I have drawn a graph, from which we can see that for the insert and delete operations of the linked list, we only need to consider the pointer changes of adjacent nodes, so the corresponding time complexity is O(1).
However, there are pros and cons. Linked lists are not as efficient as arrays for randomly accessing the KTH element. Because the data in the linked list is not stored continuously, the corresponding memory address can not be directly calculated according to the initial address and subscript through the addressing formula like array. Instead, the corresponding node needs to be traversed according to the pointer node by node until the corresponding node is found.
You can think of a linked list as a queue in which everyone only knows the person behind them, so when we want to know who the KTH person in the queue is, we need to start with the first person and count down. Therefore, linked list random access performance is not as good as array, order n time complexity.
Okay, so that’s it for singly linked lists, and then for two more complicated upgrades, circular lists and bidirectional lists.
Circular linked list is a special single linked list. In fact, circular lists are quite simple. The only difference from a singly linked list is at the end. We know that the end pointer to a singly linked list points to an empty address, indicating that this is the last node. A pointer to the end of a circular list points to the start of the list. As you can see from my diagram of a circular list, it’s connected end to end like a loop, so it’s called a “circular” list.
The advantage of a circular list is that it is easier to go from the end of the chain to the head of the chain than a single list. Circular linked lists are especially suitable when the data to be processed has a circular structure. Take the famous Joseph problem. While it is possible to do this with a single list, the code is much cleaner with a circular list.
Are singly and circularly linked lists easy? Next, let’s look at a slightly more complex, and more commonly used, linked list structure in real software development: bidirectional linked lists.
A one-way linked list has only one direction, and a node has only one successor pointer, next, to the next node. A bidirectional list, as its name implies, supports two directions, with each node having not only a successor pointer next to the next node, but also a precursor pointer prev to the previous node.
As you can see from the diagram I drew, bidirectional lists require two extra Spaces to store the addresses of their successors and predecessors. So, if you store the same amount of data, bidirectionally linked lists take up more memory than single-linked lists. Although two Pointers are a waste of storage space, they can support bidirectional traversal, which also gives the flexibility of bidirectional linked list operations. So what kind of problems are bidirectional lists better suited to solve than single lists?
Structurally, the bidirectional linked list can support the finding of the front node under O(1) time complexity, which also makes the operation of insertion and deletion of bidirectional linked list simpler and more efficient than that of single linked list in some cases.
You might say, well, I just told you that the insertion and deletion time of a single list is O(1), but how efficient could a bidirectional list be? Don’t worry, this analysis is a bit theoretical, as many books on data structures and algorithms do, but it’s actually inaccurate or preconditioned. Let me walk you through two operations on linked lists.
Let’s look at the delete operation first.
In real software development, removing a piece of data from a linked list is one of two things:
- Delete nodes whose “value is equal to a given value”;
- Deletes the node to which the given pointer points.
In the first case, whether it’s a single list or a bidirectional list, in order to find a node whose value is equal to a given value, you have to start from the beginning and go through it one by one until you find a node whose value is equal to a given value, and then delete it using the pointer operation THAT I talked about earlier.
Although the time complexity of the simple deletion operation is O(1), the traversal search time is the main time consuming point, and the corresponding time complexity is O(n). According to the addition rule in time complexity analysis, the total time complexity of the linked list operation corresponding to the node whose value is equal to the given value is O(n).
For the second case, we have found the node to delete, but delete a nodes q need to know its precursor, and singly linked lists is not support directly obtained precursor node, therefore, in order to find the precursor node, we still to traverse the list from head node, until p – > next = q, p is q precursors of nodes.
But for bidirectionally linked lists, this situation is advantageous. Because nodes in a bidirectional list already hold Pointers to their predecessors, they do not need to be traversed like single-linked lists. So, for the second case, single-linked list deletion takes O(n) time, whereas bidirectional lists only take O(1) time!
Similarly, if we want to insert a node in front of a specified node in a linked list, bidirectional lists have great advantages over single-linked lists. Bidirectional lists can be done in O(1) time, whereas unidirectional lists need O(n) time. You can analyze this for yourself by referring to the delete operations I just described.
In addition to the advantages of insert and delete operations, for an ordered list, bidirectional linked list queries by value are also more efficient than single-linked lists. Because we can record the position P searched last time, each time we search, we decide whether to search forward or backward according to the relationship between the value to be searched and the size of P, so we only need to search half of the data on average.
Now, do you think that a two-way list is more efficient than a single list? This is why in practical software development, bidirectional linked lists are more widely used than single linked lists, although they cost more memory. If you’re familiar with the Java language, you’ve probably used the LinkedHashMap container. If you dig into the implementation of LinkedHashMap, you’ll see that two-way linked lists are used as data structures.
In fact, there is a more important point that you need to grasp, and that is the design idea of space for time. When memory is available, we can choose algorithms or data structures with relatively high spatial complexity but low time complexity if we are more interested in fast code execution. On the contrary, if the memory is relatively short, such as the code running on the mobile phone or microcontroller, at this time, it is necessary to reverse the design idea of time for space.
Understand circular list and bidirectional linked list, if the integration of these two lists together is a new version: bidirectional circular linked list. I don’t want to go into too much detail, but you know what a two-way circular list looks like? You can try drawing a picture on the paper yourself.
Linked list VS array performance
The contrast between arrays and linked lists is not limited to time complexity. Moreover, in real software development, complexity analysis alone cannot be used to determine which data structure to use to store data.
Array is easy to use, in the implementation of the use of continuous memory space, you can use the CPU cache mechanism, preread the array of data, so the access efficiency is higher. Linked lists are not stored consecutively in memory, so they are not CACHE-friendly and cannot be preread effectively.
When the CPU reads data from memory, it loads the read data into the CPU cache. Instead of reading the specific address, the CPU reads a block of data (I’m not sure about the size) from memory. Then, the next time the data is accessed, the data will be searched from the CPU cache first. If it is found, the data will not be retrieved from the memory. This enables a faster mechanism than memory access, which is why CPU caches exist: they were introduced to compensate for the difference between too slow memory access and too fast CPU execution.
For arrays, the storage space is contiguous, so when you load a subscript, you can load the following subscript elements into the CPU cache, which is faster than if the storage space is not contiguous.
The disadvantage of arrays is that they are fixed in size and occupy the entire contiguous memory space once declared. If the declared array is too large, the system may not have enough contiguous memory to allocate to it, resulting in “out of memory.” If the declared array is too small, you may run out of space. Then you have to apply for a larger memory space and copy the original array into it, which is very time-consuming. Linked lists have no size limit and naturally support dynamic expansion, which I think is the biggest difference between them and arrays.
You might say, well, we have ArrayList containers in Java that can also support dynamic expansion, right? As we explained in the last video, when we insert data into an array that supports dynamic scaling, if there is no free space in the array, we apply for a larger space and copy the data, which is very time consuming.
In addition, if your code is very memory intensive, arrays are more suitable for you. Because each node in the linked list consumes extra storage space to store a pointer to the next node, memory consumption doubles. In addition, frequent insertion and deletion of linked lists will lead to frequent memory requests and releases, which can easily lead to memory fragmentation and Garbage Collection (GC) if the Java language is used.
Therefore, in our actual development, for different types of projects, there are trade-offs between arrays and linked lists.
To solve the opening
Ok, so we’re done with linked lists. Let’s go back to the question that I left you with at the beginning. How to implement LRU cache elimination algorithm based on linked list?
The idea is that when we maintain an ordered single linked list, nodes closer to the end of the list are accessed earlier. When a new data is accessed, we traverse the list sequentially, starting with the list header.
-
If the data has already been cached in the list, we iterate to get the node corresponding to the data, remove it from its original position, and insert it into the head of the list.
-
If this data is not in the cache linked list, it can be divided into two cases:
- If the cache is not full at this point, the node is inserted directly into the head of the list;
- If the cache is full at this point, the end of the list is deleted and the new data node is inserted into the head of the list.
So we implement an LRU cache using linked lists. Isn’t that easy?
Contents summary
Today we looked at the “opposite” of an array of data structures, linked lists. Like arrays, it is a very basic, very common data structure. However, linked lists are slightly more complex than arrays. There are several types of linked list structures derived from common single lists, such as bidirectional lists, circular lists, and bidirectional circular lists.
Compared with arrays, linked lists are more suitable for frequent insertion and deletion operations, and the time complexity of query is higher. However, in specific software development, the performance of arrays and linked lists should be compared to choose which one to use.
After thinking about
The question of how to tell if a string is a palindrome string is one that I’m sure you’ve heard of, and today’s question is a modified version of that question. If a string is stored in a singly linked list, how can you tell if it is a palindrome string? Do you have a good solution? What is the corresponding time space complexity?