Make writing a habit together! This is my first day to participate in the “Gold Digging Day New Plan · April More text challenge”, click to see the details of the activity.
An overview of the
Java’s collections framework is powerful because it encapsulates commonly used data structures and algorithms, making it easy for Java developers to use the collections framework API without having to master the underlying data structures and algorithms. For example, the most common array, linked list and queue, stack to binary tree, red black tree, JDK will be different degrees of encapsulation, greatly improve the development efficiency of developers. But while we don’t have to deal with this kind of low-level logic directly in our projects, we still need to be familiar with its content. Because in extreme business scenarios, we may need to debug or even optimize the underlying data structure, mastering the underlying structure is a necessary skill.
Below is a Collection interface diagram.
As we can see from the figure, the Collection interface has three main sub-interfaces:Set
,List
,Queue
.
Rounding – the List
Let’s start with the List interface.
The characteristics of the List
- The stored elements are ordered (the order in which they are stored and retrieved).
- The same element can be stored repeatedly (repeatable).
- Elements can be manipulated by index values.
List interface contains ArrayList, Vector, LinkedList three classes to implement the List interface respectively.
Let’s discuss the differences in more detail.
We classify them into two categories according to the underlying data structure:
- The bottom layer is arrays. Array query is fast, add and delete is slow.
- At the bottom are linked lists. Linked list query is slow, add and delete fast.
ArrayList and Vector belong to the underlying class of arrays.
What’s the difference between ArrayList and Vector?
ArrayList
Threads are not safe, but they are efficient and are suitable for frequent lookups.Vector
Thread-safe, but inefficient. It was also one of Java’s earliest thread-safe dynamic arrays and is not recommended for use in real production environments.
Source code analysis
ArrayList.java
transient Object[] elementData; // non-private to simplify nested class access
private int size;
public ArrayList(int initialCapacity) {
if (initialCapacity > 0) {
this.elementData = new Object[initialCapacity];
} else if (initialCapacity == 0) {
this.elementData = EMPTY_ELEMENTDATA;
} else {
throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity); }}Copy the code
It can be seen from the source code that ArrayList uses Object[] array, and check the source code of ArrayList, it is found that the class does not use the keyword of synchronized and any other lock, so ArrayList is not thread safe.
public synchronized void copyInto(Object[] anArray) {
System.arraycopy(elementData, 0, anArray, 0, elementCount);
}
public synchronized void trimToSize(a) {
modCount++;
int oldCapacity = elementData.length;
if(elementCount < oldCapacity) { elementData = Arrays.copyOf(elementData, elementCount); }}public synchronized void ensureCapacity(int minCapacity) {
if (minCapacity > 0) { modCount++; ensureCapacityHelper(minCapacity); }}Copy the code
In Vector, we use the same array structure and the synchronized keyword in our methods, which makes Vector inherently thread-safe. However, a large number of synchronous locks will cause a large number of blocked threads to wait in high concurrency scenarios, which is inefficient.
LinkedList
Let’s take a look at LinkedList. The underlying LinkedList is a LinkedList structure, which is not safe for threads and high efficiency.
Let’s look at the difference between ArrayList and LinkedList.
Difference between ArrayList and LinkedList
ArrayList
The bottom layer isObject
Array, whereas LinkedList is the underlying oneTwo-way linked listThe structure of the.
Note: JDK1.6(inclusive) and the previous use of bidirectional circular linked lists, JDK1.7 began to eliminate the loop.
Reference – Bidirectional linked list: Contains two Pointers, prev to the previous node and next to the next node.
Reference – A bidirectional circular list: the next of the last node points to the head, and the prev of the head points to the last node, forming a loop.
ArrayList
The time complexity of inserts and deletions is influenced by the location of elements. Here’s why:ArrayList
The element is appended to the end, and the time is O(1). At this point we are at the specified location by indexi
Insert or delete elements, then the time complexity is O(n-i) (n is the length of the array). This is because the ith and subsequent (n-i) elements in the array are moved one bit back or one bit forward.LinkedList
The time complexity of inserts and deletions is independent of element location. Because its data structure is bidirectional linked list, so inadd(E e)
,addFirst(E e)
,addLast(E e)
,removeFirst()
、removeLast()
Operation, its time complexity is O(1), and subscript by indexi
The time complexity for inserting or deleting is O(n). This is because the list pointer needs to be moved to the specified position before insertion.ArrayList
supportRandomAccess
(i.e. efficient random access), whileLinkedList
Is not supported.
Source code analysis
/** * Pointer to first node. * Invariant: (first == null && last == null) || * (first.prev == null && first.item ! = null) */
transient Node<E> first;
/** * Pointer to last node. * Invariant: (first == null && last == null) || * (last.next == null && last.item ! = null) */
transient Node<E> last;
Copy the code
Above is the source code for LinkedList. The LinkedList class defines two nodes, first and last, pointing to the first and last elements, respectively. It is a typical linked-list structure.
The LinkedList class does not use locks, so it is also thread-unsafe.
public class ArrayList<E> extends AbstractList<E>
implements List<E>, RandomAccess.Cloneable.java.io.Serializable
{... }Copy the code
ArrayList implements RandomAccess, which, like Cloneable, serves only as an identifier. ArrayList supports efficient random access. If Cloneable is implemented, ArrayList supports cloning. At the same time, the implementation of the Serializable interface indicates that the class can be serialized.
public class LinkedList<E>
extends AbstractSequentialList<E>
implements List<E>, Deque<E>, Cloneable.java.io.Serializable
{... }Copy the code
You can see that LinkedList does not support efficient random access. It implements a two-endian queue Deque, and the rest is the same as ArrayList.
Expansion mechanism for ArrayList
An ArrayList is a dynamic array that can grow dynamically compared to a traditional array. In the source code, the core method of expansion is grow(), here we analyze the grow() method in detail, to understand the detailed expansion mechanism, you can consult the source code or search relevant information.
Source code analysis
// minCapacity: indicates the minimum required capacity
private void grow(int minCapacity) {
// oldCapacity: oldCapacity. NewCapacity: newCapacity
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
// The MAX_ARRAY_SIZE constant value is integer.max_value - 8
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
Copy the code
Int newCapacity + (oldCapacity >> 1); Move oldCapacity one bit to the right, equivalent to oldCapacity / 2. The purpose of using the bit operation instead of the division operation is that the bit operation is much faster than the division operation, which improves efficiency and saves resources.
The logic of the grow method is as follows: move oldCapacity one bit to the right and determine whether the newCapacity is smaller than the minimum required capacity. If the condition is met (newCapacity < minCapacity), set the newCapacity to the minimum required capacity.
If the new capacity is greater than MAX_ARRAY_SIZE, the hugeCapacity() method is called to compare minCapacity with MAX_ARRAY_SIZE. The hugeCapacity code looks like this:
private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}
Copy the code
That is, if minCapacity is greater than the maximum capacity, the new capacity is integer. MAX_VALUE; otherwise, the new capacity is MAX_ARRAY_SIZE, integer. MAX_VALUE – 8.
Finally, use arrays.copyof (); Method generates a new expanded array and assigns it to the original array, elementData. This enables the expansion of the array.
In addition, ArrayList provides a scaling method called ensureCapacity that can be invoked externally.
public void ensureCapacity(int minCapacity) {
intminExpand = (elementData ! = DEFAULTCAPACITY_EMPTY_ELEMENTDATA)// any size if not default element table
? 0
// larger than default for default empty table. It's already
// supposed to be at default size.
: DEFAULT_CAPACITY;
if(minCapacity > minExpand) { ensureExplicitCapacity(minCapacity); }}Copy the code
As you can see from the source code, the ensureCapacity method uses the public modifier and is available for developers to call. Enrecapacity allows developers to dynamically scale arrays before adding large numbers of elements to avoid the performance penalty of frequent scaling. Here is not the focus of this article, specific interest can view the source code.
Rounding out the node-set
Set is also one of the important subinterfaces of Collection. As can be seen from the frame diagram at the beginning of the article, Set contains HashSet, LinkedHashSet, TreeSet and other sub-interfaces.
The characteristics of the Set
- Unordered: That is, stored elements are out of order (the order in which they are stored and retrieved).
- Uniqueness: All elements are unique.
Set is suitable for scenarios where deduplication is required.
When we classify sets, we can also classify them into two categories:
- At the bottom is a hash table, known as a HashMap, as you can see in the source below. The actual implementation of HashMap will be covered in the next article.
- The bottom layer is a binary tree.
Hashsets and linkedhashsets fall into the first category, where the underlying layer is a hash table (LinkedHashSet is a hash table + linked list). TreeSet belongs to binary tree classification.
HashSet uses hashCode() and equals() to ensure element uniqueness.
Source code analysis
Let’s look at two source code snippets for HashSet.
public HashSet(int initialCapacity) {
map = new HashMap<>(initialCapacity);
}
Copy the code
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Copy the code
As you can see from the source, the HashSet creates a HashMap object when initialized. When we add the element, we can see that the element is actually stored in the HashMap as a key that corresponds to a PRESENT constant. The PRESENT value is actually a final Object. This explains how a HashSet guarantees element uniqueness.
HashSet is often used in scenarios where element uniqueness is guaranteed, while TreeSet is often used in scenarios where sorting is required. Better performance can be achieved by using HashSet in scenarios where sorting is not required.
Talking about sorting, let’s talk specifically about the Comparable and Comparator interfaces.
Comparable
Interface injava.lang
Bag, it has acompareTo(Object obj)
Method is used for sorting.Comparator
Interface injava.util
Bag, it has acompare(Object obj1, Object obj2)
Method is used for sorting.
Comparable and Comparator interfaces
Comparable
Interface is a natural sort interface, it is used to implement natural sort,compareTo
Also known as the natural comparison method. When the developer callsCollections
In thesort
Method, the object can be implemented on its ownComparable
Do a natural sort. This object must be implementedComparable
Interface.Comparator
Interface can be understood as a custom sort interface, developers can not implementComparable
The class of the interface is sorted customarily without modifying the source code of the class.
TreeSet source code analysis
The TreeSet class provides both Comparable and Comparator sorting. Specifically we trace back to the source code.
public TreeSet(a) {
this(new TreeMap<E,Object>());
}
public TreeSet(Comparator<? super E> comparator) {
this(new TreeMap<>(comparator));
}
Copy the code
It can be seen from the source code that TreeSet is implemented through TreeMap, so we need to further enter TreeMap to check the specific sorting logic. TreeSet’s constructor provides a no-parameter constructor and a parameterized constructor. The no-parameter constructor will use the Comparable natural ordering in TreeMap, while the parameter constructor passes in a Comparator object.
Now let’s take a look inside TreeMap.
final Entry<K,V> getEntry(Object key) {
// Offload comparator-based version for sake of performance
if(comparator ! =null)
return getEntryUsingComparator(key);
if (key == null)
throw new NullPointerException();
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) key;
Entry<K,V> p = root;
while(p ! =null) {
int cmp = k.compareTo(p.key);
if (cmp < 0)
p = p.left;
else if (cmp > 0)
p = p.right;
else
return p;
}
return null;
}
Copy the code
When the comparator is not empty, the getEntry method calls the getEntryUsingComparator, which calls our custom sorting method. The default is natural sorting through the Comparable interface.
HashSet, LinkedHashSet, TreeSet comparison
HashSet
,LinkedHashSet
,TreeSet
Are allSet
The implementation classes of the interfaces, all of which ensure that elements are unique, are not thread-safe.HashSet
The underlying data structure is hash table, based onHashMap
Implement.LinkedHashSet
The underlying data structure is linked list + hash table. The order in which elements are inserted and retrieved isFirst in first out.TreeSet
The underlying data structure is a red-black tree (self-balanced binary search tree).HashSet
For scenarios where you don’t need to guarantee the order of elements in and out and don’t care about sorting,LinkedHashSet
It is applicable to ensure that elements are inserted and removed in a first-in, first-out order.TreeSet
This method is applicable to scenarios that require custom collation rules.
Supplement – Queue
A Queue is a single-ended Queue. That is, elements can only be inserted from one segment and deleted from the other. FIFO is a first-in, first-out rule. It extends the Collection interface.
A Deque is a double-endian queue, that is, both queues can insert or delete elements. It extends the Queue interface and provides special methods like push and POP that can be used to emulate the stack structure.
The LinkedList class is the most commonly used class in the Deque interface, as described above.
conclusion
At the end of this article, I summarize a table that compares the characteristics of the objects described above.
object | Underlying data structure | Thread safe or not |
---|---|---|
ArrayList | An array of | no |
Vector | An array of | is |
LinkedList | The list | no |
HashSet | Hash table | no |
LinkedHashSet | Hash table + linked list | no |
TreeSet | Red and black tree | no |
Collection in A Java container is an interview topic and is often used in development. It is extremely necessary to master the contents of the Collection system.