Summary of linear data structures

An array of

An array is a linear data structure. When creating an array, a contiguous area of memory is divided, and data is stored in each index of the contiguous area.

Array supported operations

Query (get (int index))

First, the time complexity of fetching elements by subscript is O(1). Because an array is a contiguous area of memory, and each element is of equal size, a linear equation can quickly find the memory address that corresponds to changing the subscript. For example, if it’s an int array, the memory address at position 0 is b, and index is the index you’re looking for, then obviously the memory address you’re looking for is b+4*index, and 4 is the number of bytes of an int. So the computer calculates the subscript using b+type_size*index. Of course, not all arrays are evaluated in this way; for example, some virtual machines do not create contiguous array space.

Insert (add(int index,E E)

If you want to specify the index of the array to be added, you generally need to move the elements of the array. For example, if you have an array of ten integers and you want to insert the number A into the position of the first element, you need to move all the elements backwards and forwards and then insert A into the first position. So the average time to insert an array is order n

Delete (remove (int index))

As with insert, specify subscripts for deletion. For example, if you have ten elements and delete the first one, you need to move the array element from front to back by one index. So the average time to delete an array is O(n)

Change (set(int index,E E))

The modification is relatively simple, just get the subscript and change the value of the current element

The dynamic array

Many high-level languages have the basic structure array, but using them we cannot add more elements than the total number of elements we defined in the first place. So, when we don’t know the size of the array in the beginning that’s a little bit of a hassle, we need to define our own dynamic array, we don’t care what the initial size is.

Implement your own array

To implement dynamic array, we need to rewrite the addition and deletion of array and expand the array

increase

The difference between your own increment operation and the original one is to determine whether you need to expand the array. The condition of expansion is that when the length of the array and the number of elements are the same, the expansion needs to be doubled. Procedure for capacity expansion:

Make a copy of the original array and make it twice the size
Writes all the elements of the original array to the new array and adds the newly added elements

delete

The difference between a self-implemented delete operation and the original one is to determine whether the array needs to be reduced in size, which is necessary. The size of the array must be greater than or equal to 2, and the current number of elements is half the size of the array. Steps for reducing capacity:

We delete the elements of the original array
Make a copy of the original array and make it 1/2 the size
Writes all the elements of the original array to the new array

Oscillation caused by capacity expansion and reduction

The oscillations are related to our scaling and shrinking conditions. If we scale and shrink according to the above conditions, then if the number of elements in the array is between 8 and 9, then the array size will be between 8 and 16.

The solution to oscillating depends on specific requirements. We can change the condition to “the size of the array before the reduction should be greater than or equal to 2, and the current number of elements is 1/4 of the size of the array”.

How to expand java.util.ArrayList

Expansion has to be

Public Boolean add(E E) {// the iterator fail-fast is modCount++; Add (e, elementData, size); add(e, elementData, size);return true; } private void add(E E, Object[] elementData, int s) {private void add(E E, Object[] elementData, int s)if(s == elementData.length) elementData = grow(s+1); elementData[s] = e; size = s + 1; } private Object[] grow(int minCapacity) {// Can be seen by copying a new arrayreturnelementData = Arrays.copyOf(elementData, newCapacity(minCapacity)); } // Next is how to actually expand the capacity!! private int newCapacity(int minCapacity) { // overflow-conscious code int oldCapacity = elementData.length; // The new size is 1.5 times the old size int newCapacity = oldCapacity + (oldCapacity >> 1); // If the new size is smaller than oldCapacity+1if(newCapacity -mincapacity <= 0) {// If the array is empty, the first time you add an element, you can expand the array to the size of 10 and minCapacityif (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA)
                return Math.max(DEFAULT_CAPACITY, minCapacity);
            if (minCapacity < 0) // overflow
                throw new OutOfMemoryError();
            returnminCapacity; } // return the large one, not exceeding the maximumreturn (newCapacity - MAX_ARRAY_SIZE <= 0)
            ? newCapacity
            : hugeCapacity(minCapacity);
    }
Copy the code

It can be seen that the expansion idea is very simple, my environment is JDK1.9, as for other versions of the expansion idea I think are similar. As for scaling, I won’t explain it here, but the source code for ArrayList is relatively easy to understand.

The list

Lists and arrays are linear structures, but lists are made up of nodes one by one, and each node has a next pointer to the next node, meaning that lists do not require contiguous array space. As shown in figure:

Linked list operations

Now let’s see how to add, delete, change and check linked lists

increase

The added operation time is O(1), and there is no need to move the array elements like A->C, add B to the end of A add step:

Save the next pointer to A as temp
Set A’s next pointer to B
Point NEXT of B to temp

delete

The time complexity of delete is O(1), there is no need to move array elements like array, such as A->B->C, delete B delete steps:

Get the temp variable to hold node B
Change A’s next pointer to C

The query

The time complexity of the query is O(n) because linked lists, unlike arrays, can directly calculate the memory address by subscript. So you have to traverse to find the corresponding subscript node.

The add and delete time complexity here is O(1), which makes sense, and we might conclude that the query modification of arrays is efficient and the deletion of linked lists is efficient. But actually it depends. When we insert or delete a node we need to traverse to the specified location (because we only have the address of the head node or the address of the tail node). In a previous test, the Java structures LinkedList and ArrayList were found to be more efficient when adding about a million elements to the list, because adding elements to the list at the specified location was traversed.

Virtual head node: Finally, lists usually have a head node that points to the first node of the list. This head node is useful for removing the last element without having to determine if there is only one element to delete

Stack, queue,

Stacks and queues are very similar, so look at them together.

A stack is a FILO (in, out) structure that can be implemented by arrays or linked lists. For example, when browsing a web page in a browser, entering another web page in the current web page is equivalent to pushing, and returning is equivalent to pushing out.
A queue is a FIFO (first-in, first-out) structure that can be implemented by arrays or linked lists. Queuing also has a wide range of applications, such as queuing.

The stack

Array-based stack implementation

Structure: array array, top variable

Arrays are used to hold incoming data. The top pointer points to the position next to the last element, and top points to 0 if the stack is empty.

Here’s an example:

Empty stack: When the stack is empty, top points to the zero subscript
Stack full: when the element is full, top==array.length;
Push: if an element is pushed array[top++]= pushed element
Return array[top–];

Stack implementation based on linked list

Structure: top pointer, node

The node has data, and next points to the next node
The top pointer is used to point to the top node of the stack

As shown in figure:

Stack empty: Top points to null
Push: construct a new node node where next points to top and top points to the new node node
Unstack: Use Node to hold the node pointed by top, the top pointer points to the node pointed by the next pointer, and return node

The queue

Queues based on linked lists

Structure: virtual head node, node

Virtual head node: Front pointer to the first element, rear pointer to the last element
Node: Data data element, next next node points to

As shown in figure:

Empty queue: Front and rear point to NULL
Enqueue: Constructs a new node, the next pointer to the rear pointer points to the new node
Queue out: If it is not empty, save node where front points to, return node where front points to next node

Array-based queues

There are some differences between array-based queues and array-based stacks. Can think of, when the array queue up can be added to the array later, remove the front element out at the time of the team, then there will be a problem is the front element out of the squad will become unavailable but can’t use again, there is a solution can be put at the back of the elements in the team when the front, like the dynamic array can delete the first element, However, it would be too time consuming to move the whole element every time the team is out. So I’m going to introduce a loop queue.

Understanding loop queues

The problem with circular queues is that array queues waste space. Circular queues are not circular in physical address, but in logic.

Structure: array, front pointing to the head, rear pointing to the last element of the queue

front=front%array.length

As shown in figure:

In the figure, there are two Pointers: front and rear, where front represents the head subscript and rear represents the tail subscript.
Queue empty: Front and rear point to the same position
A, b, and c join the queue, front stays the same, rear(rear=(rear+1)%array.length) points to the next coordinate of the last element
Front =(front+1)%array.length)
Finally, when increased to (rear+1)%array.length=front as shown in figure D2, the element is full

Points to pay attention to

Team, the rear = (rear + 1) % array. The length
Out of the team, the front = (front + 1) % array. The length
In the loop queue we are wasting one element position, and the advantage of this is that we can tell the difference between an open queue and a full queue.
Empty queue: Front ==rear
Full queue :(rear+1)%array.length==front
Expansion: The expansion steps are similar to the logic of dynamic arrays, except that front and rear are reassigned

Hash table

When we want to find someone’s information in the school, we will ask the Academic affairs Office for the student number, and the academic affairs Office will give you a student’s information after obtaining the student number provided by you. Here through the student number to obtain the student information is to use the idea of the hash table, and the student number and student information corresponding relationship is the hash function, and if two student number corresponding to the same student information is the hash conflict.

So let’s look at the nouns in the hash table:

Hash table: A data structure that accesses the value of a given keyword directly
Hash function: converts the given keyword into a memory address index
Hash conflict: Different keywords are converted to the same index by the hash function

Hash table action

The search time complexity of hash table is O(1), and the idea of hash table search is to find elements directly through keywords. The time complexity of increasing the deletion is also O(1), that is, the hash table

Hash table structure

There is a Node array with the current number of elements M

This Node has a key, value, and next.
M is the number of elements in the current array

The hash function

Because we’re going to try to hash different values for different keys as much as possible. So the selection of hash functions is very important.

The following are common hash functions:

Direct addressing: Take the value of a keyword or a linear function of the keyword as a hash address. H(key)=key or H(key)= A? The key + b. A and b are constants.
Numerical analysis: Through the analysis of the data, find the less conflicting parts of the data, and construct hash addresses. For example, the id number, we can extract key numbers in the ID number, such as the date of birth and the last six digits.
Square method: take the middle bits of the keyword square as the hash address.
Random number method: a random function that takes the random value of the keyword as the hash address. This method is usually used when the keyword length is different.
H(key) = key MOD P. Key is the keyword, MOD is the MOD operation, p is the length of the hash table, p is better prime number, can achieve the minimum possible hash conflict.

Next, we set the hash functions to the remainder method for analysis.

Hash conflict

A hash conflict is when two different keys hash together to get the same result. There are also various solutions to hash conflicts

Chained address method: If A hashes the index of the array to 2 and B hashes the index of the array to 2, B follows A to form A linked list. The hash table becomes an array and a linked list.
Open address method:
- Linear detection: When a hash conflict is found, add +1 to each subscript until there are no more elements in the subscript position
- Square detection: Perform 1^2, 2^2, 3^2 subscripts respectively when a hash conflict is found. Until there are no more elements at the subscript position
- Pseudo-random sequence method: When conflicts are found, add a number to the key randomly and then take the modulus until there are no more elements in the subscript position
Rehash: This method is to construct a number of different hash functions at the same time, if the first hash conflicts with the second hash function, and so on

Next, we set the hashing conflict resolution method as chain address method for analysis.

Hash table operation procedure

Increase the operating

Such as the user to add a K, V key-value pairs, divides the hash value of K calculated, then the hash value of the length of the array subscript modulus, if there is no element, subscript will K, V is encapsulated into a Node in the subscript, if the subscript have elements, K, V in the Node to join the subscript elements behind the most.

Find operations

For example, if a user wants to look up the node by K, they hash(K) and then mod the array to get the index of the array. If the index has no elements, the search fails. If the index has elements, they traverse the list from the element to the next list, and return if there are.

The deletion and modification logic are similar

Implementation of HashMap

Java HashMap is a very well encapsulated hash table, through the analysis of its implementation can also let us improve our own hash table design

Let’s take a look at what key fields it has

// Load factor static finalfloatDEFAULT_LOAD_FACTOR = 0.75 f; // Transient Node<K,V>[] table; Static class Node<K,V> implements map. Entry<K,V> {final int; static class Node<K,V> implements map. Entry<K,V> {final inthash; final K key; V value; Node<K,V> next; / /.. Some nodes other methods}Copy the code

Let’s look at the flow of a PUT (add) operation.

// add a key, value public V put(K key, V value) {return putVal(hash(key), key, value, false.true); } // AdoptedhashAlgorithm static final inthash(Object key) { int h; // You can see herehashThe key that the algorithm useshashThe xOR operation that shifts Code and H 16 bits rightreturn(key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } // The last call to putVal /** *hashKey is the key and value is the value onlyIfAbsent isfalseRepresents multiple keys that will override * EVict asfalse*/ final V putVal(inthash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // Initialize the hashMap size if the hashMap has no elements to start withif((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Create a new list node and add it to the array if there is no hash conflictif ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null); // Then there is a hash conflictelse{ Node<K,V> e; K k; // The hash value of the subscript node must be equal to the hash value of the preceding node. And reference the same object and equals equals!! This represents the logic of judging the same object!! I've stepped in holes here before. Second, if the key is null both timesif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // This is not the same object and is a TreeNode structure.else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else{// Next add the list nodes as normalfor (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // If the value is changed, the old value will be returnedif(e ! = null) { // existing mappingfor key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

There are a few things to note about the PUT operation in HashMap

If the key passed is null, putVal is added as usual, as above, and the resulting hash value is 0
The hashCode and equals keys must be the same or the same. A pit I ran into a long time ago was two objects whose hashCode returned the same value, and equals returned false, resulting in two elements always being added. In fact, this is also a design error of mine. A class better ensure that the overridden hashCode and quals methods are the same

The next article will summarize tree structured data types, and the next article will share the data types I implement

An array of

Array supported operations

Query (get (int index))

Insert (add(int index,E E)

Delete (remove (int index))

Change (set(int index,E E))

The dynamic array

Implement your own array

increase

delete

Oscillation caused by capacity expansion and reduction

How to expand java.util.ArrayList

The list

Linked list operations

increase

delete

The query

Stack, queue,

The stack

Array-based stack implementation

Stack implementation based on linked list

The queue

Queues based on linked lists

Array-based queues

Understanding loop queues

Hash table

Hash table action

Hash table structure

The hash function

Hash conflict

Hash table operation procedure

Increase the operating

Find operations

Implementation of HashMap

Related Posts

Layout of objects in memory

Java multithreading series — Master the interview, an insight into Synchronized lock

Use maps to improve MongoDB performance