Today’s hero is arrays, arrays are also linear data structures. Unlike the linked lists in the previous section, in an array, accessing data is simple, and adding and deleting data is time-consuming. This is similar to the phonebook in which names are arranged in alphabetical order in the what is a Data structure article.
An array of
Blue, Yellow, and Red are stored as data in the array, where a is the name of the array, and the number after [] indicates the number of data in the array. This number is also the subscript of the array **, whose subscript starts from 0 and counts **. For example, Red is the second data in array A.
So why are arrays numbered from 0 in many programming languages? Don’t worry, you can think about it first, will be explained at the end of the article.
As you can see from the figure, the array data is stored sequentially in a contiguous space of memory.
Since data is stored in contiguous space, the memory address (location in memory) of each data item can be calculated using the array subscript, which allows direct access to the target data, i.e., random access.
For example, if we want to access Red in a linked list, we can only look it up from scratch using Pointers, but in an array, we can access Red directly by specifying a[2].
However, if you want to add or delete data anywhere, an array is much more complicated than a linked list. Here we try to add Green to the second position.
First, the amount of storage you need to add is secured at the end of the array.
To make room for the new data Green, move the existing data one by one, first moving Red back.
And then I’m going to move Yellow back.
Finally, write Green in the empty space.
The operation of adding data is complete.
Conversely, what if you want to delete Green?
First, delete the target data Green.
And then move the data one by one to the empty space, first move Yellow forward.
Next move Red.
Finally, delete the extra space, and the Green is deleted.
supplement
Here’s how much time it takes to operate on an array. Let’s say there are n items in the array. Since the data is accessed randomly (the memory address is computed by subscript), the running time is only constant O (1).
The addressing formula for the memory address calculated by array subscript is as follows:
a[i]_address = base_address + i * data_type_sizeCopy the code
Where baseAddress is the first address of the memory block, datatyPE_size is the size of each element in the array.
On the other hand, when you want to add new data to the array, you must remove the data after the target location one by one. So, if you add data to the header of the array, it takes O (n) time, and the same goes for deleting.
In both linked lists and arrays, data is arranged in a linear column. Accessing data in linked lists is more complicated, adding and deleting data is simpler; While accessing data in an array is simple, adding and deleting data is more complex.
We can decide which data structures to use based on which operations are more frequent.
Finally, let’s consider the question we started with: why do arrays start numbering at 0 in many programming languages?
To reassure
From the memory model of array storage, the best definition of “subscript” is “offset”. If a[0] represents the beginning of the array, a[0] represents the beginning of the array, and a[k] represents the beginning of the array. If a[0] represents the beginning of the array, a[k] represents the beginning of the array.
a[k]_address = base_address + k * type_sizeCopy the code
However, if the array counts from 1, then we calculate the memory address of the array element a[k] as:
a[k]_address = base_address + (k-1)*type_sizeCopy the code
Comparing the two formulas, we can see that numbering from 1 increases the number of subtractions for each random access to an array element, which in the case of the CPU, increases the number of subtractions.
Array is a very basic data structure, and random access to array elements by subscript is a very basic programming operation, so efficiency optimization should be as extreme as possible. So to eliminate one subtraction, the array is numbered from 0 instead of 1.
In addition, there are historical reasons. The designers of C language started counting array subscripts with 0, and later high-level languages such as Java and JavaScript followed the example of C language. In other words, in order to reduce the learning cost of Java for PROGRAMMERS of C language to some extent, So the custom of counting from zero continues. In fact, arrays are not counted from zero in many languages, such as Matlab. There are even languages that support negative subscripts, such as Python.
conclusion
This article mainly introduces the commonly used array in data structure. Array uses a piece of continuous memory space to store a group of data of the same type. The biggest characteristic of array is that it supports random access, but the insertion and deletion operations become relatively inefficient, and the average time complexity is O(N). An efficient search algorithm is binary search, which takes advantage of random array access.
In general, arrays are good for scenarios where there are many operations and few writes, as opposed to linked lists in our last article.
reference
My First Algorithm Book
The beauty of data structures and algorithms