Why data structures and algorithms?
Our goal is to build awareness of time complexity, space complexity, write high-quality code, be able to design infrastructure, improve programming skills, train logical thinking, and accumulate life experiences that will pay off in your work, realize your value, and improve your life.
With data structures and algorithms, you can look at problems in a very different way.
Benefits of learning data structures and algorithms:
- The immediate benefit is the ability to write better-performing code.
- Algorithms are ideas and methods of problem solving that have the opportunity to be applied to other aspects of life and career.
- Algorithms are one of the few effective ways to train the brain’s ability to think, which is the most important core competence of an individual in the long run.
How to grasp the key points and efficiently learn data structures and algorithms
Broadly speaking, data structure refers to the storage structure of a group of data. An algorithm is a set of methods for manipulating data.
In a narrow sense, it refers to some well-known data structures and algorithms, such as queue, stack, heap, binary search, dynamic programming, etc.
What do data structures have to do with algorithms? Why do most books put these two things together?
This is because data structures and algorithms are mutually reinforcing. Data structures serve algorithms, which operate on specific data structures. So we can’t talk about algorithms in isolation, and we can’t talk about data structures in isolation.
For example, common binary search algorithms require arrays to store data because of the random access nature of arrays. But if we choose a data structure like a linked list, the binary lookup algorithm won’t work because linked lists don’t support random access.
Learn the key
- Complexity analysis, data structures and algorithms solve the problem of how to store and process data more cheaply and quickly. Therefore, we need a method that measures efficiency and resource consumption. This is the complexity analysis method. It takes up almost half of data structures and algorithms, so to speak, and it’s the essence of data structures and algorithms.
- 10 data structures: array, linked list, stack, queue, hash table, binary tree, heap, hop table, graph, Trie tree.
- 10 algorithms: recursion, sorting, binary search, search, hash algorithm, greedy algorithm, divide-and-conquer algorithm, backtracking algorithm, dynamic programming, string matching algorithm.
- Learn its history, its own characteristics, suitable for solving problems, practical application scenarios
Learning skills
- Practice while learning, moderate brush questions
- Ask more, think more, interact more
- Beat strange upgrade learning method, self-motivation
- Knowledge needs precipitation, don’t try to master it all at once
Complexity analysis
What is complexity analysis
- Data structure and algorithm solution is “how to make the computer faster time, more space to solve the problem”.
- Therefore, the performance of data structure and algorithm should be evaluated from two dimensions of execution time and occupied space.
- Time complexity and space complexity are used to describe performance problems respectively, which are collectively called complexity.
- Complexity describes the growth of the algorithm execution time (or space) in relation to the size of the data.
Second, why complexity analysis
- Compared with performance testing, complexity analysis has the characteristics of independent execution environment, low cost, high efficiency, easy operation and strong guidance.
- Master complexity analysis, will be able to write better performance of the code, is conducive to reduce the cost of system development and maintenance.
How to conduct complexity analysis
1. Big O notation
- source
The execution time of the algorithm is proportional to the execution times of each line of code, expressed by T(n) = O(f(n)), where T(n) represents the total execution time of the algorithm, f(n) represents the total execution times of each line of code, and N often represents the size of data. O in the formula means that the execution time of the code T(n) is proportional to the expression F (n).
- The characteristics of
Taking time complexity as an example, since time complexity describes the growth and change trend of algorithm execution time and data size, constant order, low order and coefficient do not have a decisive influence on this growth trend, so these terms are ignored in time complexity analysis.
2. Rules of complexity analysis
- Single pieces of code look at high frequencies: loops, for example.
- Maximization of multiple pieces of code: For example, if a piece of code has a single loop and multiple loops, take the complexity of multiple loops.
- Product of nested code: such as recursion, multiple loops, etc
- Add multiple scales: for example, if the method has two parameters that control the number of cycles, then add the complexity of the two.
Common complexity levels
The polynomial order
As the size of the data increases, the execution time and space occupied by the algorithm increases in proportion to the polynomial.
- O(1)(constant order)
- Order n logn
- O(n) linear order
- O(nlogn)(linear log order)
- O(n2n^2n2)(square order)
Nonpolynomial order
With the increase of data size, the execution time and space of the algorithm increase dramatically, and the performance of this algorithm is very poor.
- O(2n2^n2n)(exponential order)
- O(n!) (Factorial order)
An array of
An array is a linear table data structure. It uses a contiguous set of memory Spaces to store sequential sets of data of the same type.
Array features:
- All elements in an array must be of the same type (all int, double, etc.).
- The elements of an array are contiguous in memory.
- Arrays support sequential access and random access, subscript random access time complexity is O(1)
- Arrays are fast to read and difficult to insert and delete.
Why are arrays numbered from 0 instead of 1 in most programming languages?
From the memory model of array storage, the best definition of “subscript” is “offset”. If a[0] is offset by 0, then a[k] is offset by k type_size. If a[0] is offset by 0, then a[k] is offset by k type_size.
a[k]_address = base_address + k * type_size
Copy the code
However, if the array counts from 1, then we calculate the memory address of the array element a[k] as:
a[k]_address = base_address + (k-1)*type_size
Copy the code
Comparing the two formulas, it is not difficult to see that numbering from 1 increases the subtraction operation for each random access to the array element, which is an additional subtraction instruction for the CPU.
Array is a very basic data structure, and random access to array elements by subscript is a very basic programming operation, so efficiency optimization should be as extreme as possible. So to eliminate one subtraction, the array is numbered from 0 instead of 1.
But I think the main reason is historical. The designers of C used 0 to start counting array subscripts, and subsequent high-level languages have followed C’s example, so they continue to use the habit of counting arrays from 0.
The list
A linked list is a data structure that is discontinuous in physical memory. Linked lists use Pointers to concatenate a set of discrete chunks of memory.
Linked list features:
- Each element of the linked list stores the address of the next element, thus making a series of random memory addresses strung together.
- Adding an element to a linked list is easy: just put it into memory and store its address in the previous element.
- Linked lists only support sequential access.
- Linked lists are fast to insert and delete.
- Linked list queries are slow because they must be looked up one element at a time.
- Compared with arrays, linked list elements also store the addresses of other elements, which increases the memory overhead.
Runtimes for common array and linked list operations
An array of | The list | |
---|---|---|
read | O(1) | O(n) |
insert | O(n) | O(1) |
delete | O(n) | O(1) |
Singly linked lists
A list is called a singly linked list when each node of the list contains only one pointer field. A tail pointer to a singly linked list points to an empty address.
Common terms:
- Header pointer: A pointer to the first node (which may be a header or a header element). A linked list can be without a header, but not without a header pointer.
- Header: Sometimes we have a node called a header that precedes the first element of a single list (the header element). The header data field can store no information or additional information such as the length of a linked list.
- Header element: The first node in the list that contains a valid element.
Circular linked list
Circular linked list is a special single linked list. The tail pointer of a circular list points to the head of the list, so that the rest of the list can be found from any node in the list.
Circular linked lists are especially suitable when the data to be processed has a circular structure.
Two-way linked list
Bidirectional lists support two directions, with each node having not only a successor pointer next to the next node, but also a precursor pointer prev to the previous node.
Bidirectionally linked lists require two extra Spaces to store the addresses of successors and precursors. So, if you store the same amount of data, bidirectionally linked lists take up more memory than single-linked lists. Although two Pointers are a waste of storage space, they can support bidirectional traversal, which also gives the flexibility of bidirectional linked list operations.
Advantages of bidirectional linked lists over single linked lists:
Structurally, the bidirectional linked list can support the finding of the front node under O(1) time complexity, which also makes the operation of insertion and deletion of bidirectional linked list simpler and more efficient than that of single linked list in some cases.
In real software development, removing a piece of data from a linked list is one of two things:
- Delete nodes in the linked list whose “value is equal to a given value”;
- Deletes the node to which the given pointer points.
In the first case, whether it’s a singly linked list or a bidirectionally linked list, in order to find a node whose value is equal to a given value, you have to start from the beginning and go through it one by one until you find a node whose value is equal to a given value, and then delete it using the pointer operation THAT I talked about earlier. Although the time complexity of the simple deletion operation is O(1), the traversal search time is the main time consuming point, and the corresponding time complexity is O(n). The total time is O(n).
In the second case, we have found the node to be deleted, but to delete a node Q, we need to know its precursor node, and the single linked list does not support directly obtaining the precursor node. Therefore, in order to find the precursor node, we still need to traverse the linked list from the beginning node until P ->next=q, indicating that P is the precursor node of Q.
But for bidirectionally linked lists, this situation is advantageous. Because nodes in a bidirectional list already hold Pointers to their predecessors, they do not need to be traversed like single-linked lists. So, for the second case, single-linked list deletion takes O(n) time, whereas bidirectional lists only take O(1) time!
Similarly, if we want to insert a node in front of a specified node in a linked list, bidirectional lists have great advantages over single-linked lists. Bidirectional lists can be done in O(1) time, whereas unidirectional lists need O(n) time.
In addition to the advantages of insert and delete operations, for an ordered list, bidirectional linked list queries by value are also more efficient than single-linked lists. Because we can record the position P searched last time, each time we search, we decide whether to search forward or backward according to the relationship between the value to be searched and the size of P, so we only need to search half of the data on average.
How to use linked lists to implement LRU cache elimination strategy
Cache is a technology to improve the performance of data reading. It is widely used in hardware design and software development, such as CPU cache, database cache, browser cache and so on.
The size of the cache is limited. When the cache is full, what data should be purged and what data should be retained? This requires a cache elimination strategy. There are three common FIFO policies: First In, First Out (FIFO), Least Frequently Used LFU(Least Recently Used LRU).
Implementation idea:
We maintain an ordered single linked list, and nodes closer to the end of the list are accessed earlier. When a new data is accessed, we traverse the list sequentially, starting with the list header.
- If the data has already been cached in the list, we iterate to get the node corresponding to the data, remove it from its original position, and insert it into the head of the list.
- If the data is not in the cache list, it can be divided into two cases: if the cache is not full, the node is inserted directly into the head of the list; If the cache is full at this point, the end of the list is deleted and the new data node is inserted into the head of the list.
So we implement an LRU cache using linked lists. Isn’t that easy?
Now let’s see what the time complexity of this cache access is. Because we need to iterate through the linked list regardless of whether the cache is full or not, the time complexity of cache access is O(n).
Time and space interchange ideas
Caching is actually a space-for-time design idea. If we store the data on the hard disk, it will save memory, but it will be slower to ask the hard disk every time we look up the data. However, if we use caching technology to load data into memory in advance, although it will consume more memory space, the speed of each data query will be greatly improved.
When memory is available, we can choose algorithms or data structures with relatively high spatial complexity but low time complexity if we are more interested in fast code execution. On the contrary, if the memory is relatively short, such as the code running on the mobile phone or microcontroller, at this time, it is necessary to reverse the design idea of time for space.
How to easily write the correct list code
-
Understand the meaning of Pointers or references
Assigning a variable to a pointer is essentially assigning the address of the variable to the pointer, or conversely, the pointer holds the memory address of the variable, points to the variable, and the pointer finds the variable.
-
Be alert for pointer loss and memory leaks
-
Use sentries (headers) to simplify the implementation
-
Pay special attention to boundary condition processing
- Whether the code works if the list is empty
- Does the code work if the linked list contains only one node
- Does the code work if the linked list contains only two nodes
- Whether the code logic works properly when dealing with headers and endpoints
-
Draw pictures with examples to help you think
-
Write and practice, there is no shortcut
Five common linked list operations
- Single-linked list inversion
- Linked list central detection
- Merge two ordered linked lists
- Delete the NTH node from the penultimate list
- Find the intermediate node of the linked list