When it comes to data structures and algorithms, everyone runs away. This was originally a professional basic course, but most of the people did not learn it well, let alone me this kind of half-way code farmers. To tell you the truth, I still envy the programmers with professional training, because you only need to review it in your daily work or interview, while I learned it completely from scratch. Fortunately, it’s not too late. Here, we’ll use PHP as an example code to really learn the horror of data structures and algorithms from scratch.

The data structure

What is a data structure?

A data structure is a collection of data elements with a “structure,” which is the relationship that exists between the data elements

This is Professor Yan Weimin’s definition of data structure in the second edition of Data Structure. In fact, it’s a combination of data. It’s like when you go to a bookstore, or a library, or a bookcase in your home. How should these books be arranged? The way the book is placed is the data structure. You can put it at random, can also be placed in categories, can also be put according to their own interests and hobbies, you can also put the most commonly used books at hand, will not often read the book in the depths of the cabinet. These are data structures.

In the programming world, data structures have two forms, one is logical and the other is physical.

Logical structure

Even if you’ve never been exposed to data structures, if you’ve ever studied programming, you’ve probably heard some of these terms: sets, linear tables, trees, graphs, which refer to logical structures within a data structure. In other words, it is a mathematical model abstractly derived from a specific problem to describe data from a logical relationship. For example, we divide books into categories, and under each category there are corresponding books, which is a kind of tree structure. Or we can index the book by pinyin, and then put an index label on the shelf, which is a hash structure.

Logical structures will be a focus throughout our study, as various algorithms are implemented for operations against these structures. This will be explained in more detail in the algorithm explanation below.

The physical structure

Physical structure is mainly the physical storage of data, also known as storage structure. This is pretty easy to remember, but it only comes in two forms, sequential storage and chained storage. Normally, sequential storage structures are represented by arrays, and chained storage structures are represented by Pointers to structures in C, but in PHP, chained storage structures are represented by classes.

All the logical structures mentioned above can be implemented in a sequential or chain way. No matter which way is used, the algorithm operation of the corresponding logical structure can be completed, but different forms or algorithms have different efficiency. And efficiency is the core of the whole data structure and algorithm learning core.

algorithm

Now let’s see what an algorithm is.

An algorithm is a finite set of instructions that takes some input (in some cases no input), produces an output, and must terminate after a finite number of steps

This is the definition of algorithm by Chen Yue in the second edition of “Data Structure” of Zhejiang University. In fact, if we simply understand the above data structure, a series of operations, is the algorithm. Let’s say we define a tree, how do we traverse the tree? So this is an algorithm, you can traverse a tree in order, in order, in order, in order, you can also traverse a tree in order, there are so many ways, which one is better? Which is bad? What kind of physical structure? Linear or chained? These conclusions are based on the execution efficiency of the algorithm. It can be said that there are many kinds of algorithms and their efficiency varies greatly. However, we can’t just say that an algorithm is not good. Each algorithm also has its different application scenarios. That’s why we’re looking at algorithms.

Regarding the algorithm, the most important thing we care about is its efficiency, which is defined here in terms of time complexity and space complexity.

Time complexity

Time complexity is generally expressed in terms of O(n), which is concerned with problem size and statement frequency, and is generally expressed in terms of n (note that this n is unknown, and if this n is known, it is constant order). This n could be a constant, which is O(1). That’s the best case, or you can grow linearly, like order n. It could be logarithmic or exponential, O(logN), O(N^2), and worst of all, O(2^N), in which case you’ll probably never see the results of your calculations in your lifetime.

We can look at a simple piece of code to analyze its time complexity:

echo $a++, PHP_EOL; // O(1) $n = 10; $I = 0; $I = 0; $I = 0; $I = 0; $I = 0; $I = 0; $I = 0; $i<$n; $i++){ echo $i, PHP_EOL; } // O(n) for($i = 0; $i<$n; $i*=2){ echo $i, PHP_EOL; } // O(logN) for($i = 0; $i<$n; $i++){ for($j = 0; $j<$n; $j++){ echo $i, $j, PHP_EOL; } } // O(N^2)

From the above code, we can see that the number of nested loops has a great relationship with the growth of n. In addition, the operation inside the loop also affects the growth of n. For example, if we are a piece of code like this:

$n = 10; $m = 3; $m = 3; $m = 3; $I = 0; $I = 0; $I = 0; $i<$n; $i++){ for($j = 0; $j<$m; $j++){ echo $i, $j, PHP_EOL; }}

So it’s not O(N^2), it’s O(NM), because both of our cycles are not operations on the largest N, they’re operations on N * M. However, if M is large or even equal to N, then this algorithm becomes N^2.

In fact, the analysis of the time complexity needs some mathematical skills, but for a code farmer like me who is half-finished, the time complexity of an algorithm can be roughly analyzed by grasping the cycle level and the operation situation within the cycle. Of course, for some of the more difficult interview questions, or need to use mathematical methods to decompose the correct time complexity.

In addition, in an algorithm or a function, the time complexity shall be whichever is the largest. At the same time, the best time complexity and the worst time complexity shall also be considered, because the time complexity may be worse and worse when the data is large based on the data scale. At this time, we will take the worst time complexity as the final time complexity of this section of algorithm.

Regarding the problem of time complexity, you can refer to all kinds of algorithm books. Of course, it is best to focus on university textbooks and exercises. If you do more problems, you can master the problem more deeply.

Spatial complexity

The spatial complexity is less of a concern in data structures and algorithms than the temporal complexity, because most of the time when we only rely on a third party variable, the spatial complexity is O(1). If you need an array or a linked list to implement the algorithm, the space complexity of the algorithm is O(n).

In general, we don’t care too much about space complexity, because most algorithms tend to stay at the O(1) or O(n) level. Of course, there are some algorithms that take up a lot of space and complexity, so they can burst your memory in minutes. When we come across this type of algorithm, we’ll talk about the space complexity separately.

conclusion

The first article is based on the theoretical aspects of things, this is the actual situation of learning data structure and algorithm, must be a combination of theory and practice of learning. Let us start from now on, towards this bottomless super pit!!

References:

Data Structures, 2nd Ed., Weimin Yan

Data Structure, 2nd Edition, Yue Chen

“Notes on Data Structure with High Score”, 2020 edition, Tianqin postgraduate entrance examination

= = = = = = = = = = =

Search for “Hardcore Project Manager” on all media platforms