This is the 27th day of my participation in Gwen Challenge.
Almost every programming language has an array as a data type. It is not only a data type, but also a data structure.
Array uses a contiguous memory space to store a group of data of the same type. Its biggest feature is that it supports random access, but the efficiency of insert and delete operations is very low, and the average time complexity is less than O(n).
1. How can arrays be randomly accessed
1.1 Array features
An array is a linear table data structure. A contiguous set of memory Spaces to store the same type of data.
Random access capability is supported because it has two features:
- Linear tables, that is, rows of data like a linear mechanism
- Contiguous memory space and the same type of data
1.2 Implement random access
Let’s take an array int[] a = new int[10] of length 10. In the figure I drew, the computer allocates a contiguous memory space of 1000 ~ 1039 to array A [10], where the beginning address of the memory block is base_address = 1000.
We know that the computer assigns an address to each memory location, and the computer accesses the data in memory through the address. When a computer needs to randomly access an element in an array, it first calculates the memory address where the element is stored using the following addressing formula:
A [I]_address = base_address + I * data_type_size Copy codeCopy the code
Where data_type_size represents the size of each element in the array. In our example, the array stores data of type int, so the data_type_size is 4 bytes.
Inefficient inserts and deletions
2.1 Operations related to Insert and Delete
Array inserts and deletions can involve moving other array elements, which is a root cause of inefficiency.
Let’s say the length of the array is n. Now, if we need to insert a piece of data into the KTH position in the array. In order to make room for the KTH position, for the new data, we need to move the k to n elements one bit back.
If you insert elements at the end of the array, you don’t need to move the data, which is O(1). But if you insert elements at the beginning of the array, all the data has to be moved one bit back, so the worst time complexity is O(n). Since we insert elements at each location with the same probability, the average case time complexity is (1+2+… N)/n = O (n).
2.2 Insert and delete optimizations for specific cases
2.2.1 Optimization of Insert Operations
If the data in the array is ordered, when we insert a new element somewhere, we must move the data after K in the same way as before. However, if the data stored in the array is random, the array is just treated as a collection of stored data. In this case, if we want to insert an array into the KTH location, we can simply move the KTH location to the end of the array element and put the new element directly into the KTH location to avoid massive data migration.
To better understand this, let’s take an example. Suppose array A [10] stores the following five elements: A, B, C, D, e.
We now need to insert element X into position 3. We simply put C into a[5] and assign a[2] to x. Finally, the elements in the array are as follows: A, B, x, D, e, c.
Using this technique, the time complexity for inserting an element at the KTH position in a given scenario is reduced to O(1). This idea is also used in quicksort
2.2.2 Optimization of the Deletion Operation
Similar to inserting data, if we want to delete the KTH location, we also need to move the data for the sake of memory continuity, otherwise there will be a hole in the middle, memory will not be continuous.
Similar to insertion, if the data at the end of the array is deleted, the best-case time complexity is O(1); If the initial data is deleted, the worst-case time complexity is O(n); The average time complexity is also O(n).
In fact, in some special cases, it is not necessary to pursue continuity of data in an array. Wouldn’t it be much more efficient if we did multiple deletions all at once?
Let’s continue with the example. Array A [10] contains eight elements: A, B, C, D, E, F, G, h. Now, we’re going to delete elements A, B, and C.
In order to avoid data d, E, F, G, and H being moved three times, we can first record the deleted data. Each deletion does not actually move the data, only that the record data has been deleted. When there is no more space in the array to store the data, we trigger another actual delete, which greatly reduces the amount of data moved by the delete.
This operation is similar to the core idea of the JVM’s clear garbage collection algorithm
3. The most common problem with arrays: arrays out of bounds
Array out of bounds means that when accessing an array, the address space allocated to the array is out of bounds. C language array out of bounds is special and will not be warned at runtime. Other languages such as Java throws Java lang. ArrayIndexOutOfBoundsException anomalies. Many computer viruses also take advantage of the code in the array out of bounds can access illegal address vulnerability, to attack the system, so when writing code must be aware of the array out of bounds.
4. Container && array
The great advantage of an ArrayList is that it encapsulates many of the details of array operations. For example, array insertion and data removal need to move other data. In addition, it supports dynamic capacity expansion, because capacity expansion involves memory allocation and data relocation, which is time-consuming. So if you know how much data you want to store in advance, it’s a good idea to specify the data size beforehand when you create an ArrayList.
The array itself needs to be pre-sized when it is defined because contiguous memory space needs to be allocated. If you need to increase the size of the array, you can only reallocate a larger space, copy the original data over, and then insert new data.
Using arrays and containers scenarios:
- Java ArrayList cannot store basic types, such as int and Long, and needs to be packaged as Integer and Long classes. Autoboxing and Unboxing have some performance costs, so arrays are a good choice if you are particularly concerned with performance or want to use basic types.
- If the size of the data is known in advance and manipulation of the data is simple, you can use arrays without using most of the methods provided by ArrayList.
- If you want to represent multi-dimensional arrays, arrays are more intuitive.
For business development, using containers directly is sufficient, saving time and effort. After all, the loss of a loss of performance, will not affect the overall performance of the system. But if you’re doing something very low-level, like developing a network framework, where performance needs to be optimized to the extreme, arrays will be preferred over containers.
5. Array indexes start at 0, not 1
5.1 Special memory model
The best definition of subscript is “offset”. If a[0] is offset by 0, then a[k] is offset by k type_size. If a[0] is offset by 0, then a[k] is offset by k type_size.
a[k]_address = base_address + k * type_size
Copy the code
However, if the array counts from 1, then we calculate the memory address of the array element a[k] as:
a[k]_address = base_address + (k-1)*type_size
Copy the code
Comparing the two formulas, it is not difficult to see that numbering from 1 increases the subtraction operation for each random access to the array element, which is an additional subtraction instruction for the CPU.
Array is a very basic data structure, and random access to array elements by subscript is a very basic programming operation, so efficiency optimization should be as extreme as possible. So to eliminate one subtraction, the array is numbered from 0 instead of 1.
5.2 Historical Causes
C was like that from the beginning, other languages followed it, and there are other languages that didn’t, such as Matlab, Python.