This article takes you through the concepts of Java containers from Java source code.

References:

  1. Summary of Java container related knowledge

  2. Java official API documentation

1 Common container inheritance diagram

Let’s start with an online inheritance diagram

Collection inheritance diagram

For example, Iterator is not a container, but a method interface that iterates through collections, so it shouldn’t be in there. And maps should not inherit from collections. So I sorted out a common inheritance diagram as follows

New collection inheritance diagram

The important interfaces and implementation classes are explained from the top down, as shown above.

2 the Collection and the Map

There are two types of collections defined in the Java container. The top-level interfaces are Collection and Map. However, neither interface can be directly implemented and represents two different types of containers.

Simply put, a Collection represents a sequence of single element objects (ordered/unordered, repeatable/unrepeatable, etc., depending on the specific subinterface Set, List, Queue, etc.). A Map represents a collection of “key-value pairs” (again, ordered/unordered, depending on the implementation)

2.1 the Collection

According to the official Java documentation for Collection

The root interface in the collection hierarchy. A collection represents a group of objects, known as its elements. Some collections allow duplicate elements and others do not. Some are ordered and others unordered. The JDK does not provide any direct implementations of this interface: it provides implementations of more specific subinterfaces like Set and List. This interface is typically used to pass collections around and manipulate them where maximum generality is desired.

It basically means

Is the top-level interface in a container inheritance relationship. Is a group of object elements. Some containers allow repeating elements and some don’t, some are ordered and some are unordered. The JDK does not provide an implementation of this interface directly, but does provide subinterfaces that inherit it, such as List sets. This interface is designed to abstract the operations of elements as much as possible.

Interface definition:

public interface Collection<E> extends 可迭代<E> {... }Copy the code

The generic type is the type of the element object in the Collection, and the inherited Iterable is a defined Iterable interface that iterates hasNext Next. Concrete implementations are still implemented in concrete classes.

Let’s look at some of the important interface methods defined

add(E e) 
 Ensures that this collection contains the specified element

clear()
 Removes all of the elements from this collection (optional operation).

contains(Object o)
 Returns true if this collection contains the specified element.

isEmpty()
 Returns true if this collection contains no elements.

iterator()
 Returns an iterator over the elements in this collection.

remove(Object o)
 Removes a single instance of the specified element from this collection, ifit is present (optional operation). retainAll(Collection<? > c) Retains only the elementsin this collection that are contained inThe specified collection (optional operation). (**ps: this interface will not return the specified collection **) size() Returns the number of elementsin this collection.

toArray()
 Returns an array containing all of the elements in this collection.

toArray(T[] a)
 Returns an array containing all of the elements in this collection; the runtime typeThe returned array is that of the specified array. (** PS: this interface can also be specified.)Copy the code

The interface defined above represents the most basic operations of a container like Collection, including insert, remove, query, etc. It can be found that all operations are performed on a single element. Collections like Collection are the storage of element objects. There are two interfaces that I haven’t used before but think are very useful

  1. retainAll(Collection
    c) Preserve the specified collection
  2. ToArray (T[] a) can be converted to an array

2.2 the Map

Map description in the Official Java documentation

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.

The Map interface provides three collection views, which allow a map’s contents to be viewed as a set of keys, collection of values, or set of key-value mappings. The order of a map is defined as the order in which the iterators on the map’s collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not.

It basically means

An object that holds key-value mappings. Mapping A Map cannot contain duplicate keys. Each key corresponds to a maximum of one value.

This interface replaces Dictionary, an abstract class.

The Map collection provides three traversal access methods: 1. Obtain the collection of all keys and access values through keys. 2. Obtain the set of values. 3. Obtain a set of key-value pairs. A key-value pair is an object containing a key and a value respectively. The order in which a Map is accessed depends on the order in which the Map’s traversal access methods are traversed. Some maps, such as TreeMap, guarantee access order, while others, such as HashMap, do not.

The interface is defined as follows:

public interface Map<K.V> {...interface Entry<K.V> {
        K getKey(a);
        V getValue(a); . }}Copy the code

Generics represent the types of key and value respectively. At this time, notice that an internal interface Entry is also defined. In fact, each key-value pair is an instance relation object of an Entry. Therefore, Map is actually a Collection of entries, and the entries contain keys and values. Set the rule that the key does not repeat, and the Map will naturally evolve. (Personal understanding)

Here are three defined methods for traversing a Map.

  1. Set keySet()

    A Set of all keys is returned, and since keys cannot be repeated, the Set format is returned, not the List format. The difference between Set and List will be explained later. A Set of keys is a Collection, so we can iterate through all the keys through the Iterator. Then we can get the value from the get method. The following

    Map<String,String> map = new HashMap<String,String>();
    map.put("01"."zhangsan");
    map.put("."."lisi");
    map.put("3"."wangwu");
    
    Set<String> keySet = map.keySet();// Get the Set of all keys in map Set
    Iterator<String> it = keySet.iterator();// With a Set, we can get its iterators.
    
    while(it.hasNext()) {
         String key = it.next();
          String value = map.get(key);// If you have a key, you can get the corresponding value through the get method of the map collection.
         System.out.println("key: "+key+"-->value: "+value);// Get the key and value values
    }Copy the code
  2. Collection values()

    Get the set of values directly, no longer get the key. So you can use this method for scenarios that only need value. Once obtained, Iterator is used to iterate over all values. The following

    Map<String,String> map = new HashMap<String,String>();

    map.put("01", "zhangsan");

    map.put("02", "lisi");

    map.put("03", "wangwu");



    Collection<String> collection = map.values(); // Return value is a Collection of values

    System.out.println(collection);Copy the code
  3. Set< Map.Entry< K, V>> entrySet()

    The entire Entry object is returned as an element with all the data. It then iterates through the Entry and obtains the key and value using getKey and getValue, respectively. The following

    Map<String,String> map = new HashMap<String,String>();
    map.put("01"."zhangsan");
    map.put("."."lisi");
    map.put("3"."wangwu");
    
    // The entrySet() method is used to retrieve the mapping from the map collection (this relationship is of type map.Entry).
    Set<Map.Entry<String, String>> entrySet = map.entrySet();
    // Set entrySet into an iterator
    Iterator<Map.Entry<String, String>> it = entrySet.iterator();
    
    while(it.hasNext()) {
         Map.Entry<String, String> me = it.next();// Get the map.entry relationship object me
          String key = me.getKey();// Get the key from the relationship object
          String value = me.getValue();// Get value from the relational object
    }Copy the code

From the above three traversal methods we can see that if you only want to obtain the key, the recommended use of keySet. If you only want to obtain value, values are recommended. If the key value wants to traverse, entrySet is recommended. (Although keySet can be used to obtain a key and then a value indirectly, it is not recommended to use this method as it is less efficient than entrySet.)

List, Set, and Queue

In the integration chain of Collection, we introduce List, Set, and Queue. It will focus on lists and sets, as well as several commonly used implementation classes. Queue is really not used at all.

A quick overview of lists and sets. Both of them are subinterfaces that inherit a Collection, which means they are also containers that store individual elements. But the big differences are as follows

  1. A List is a container of stored elements that has an ordered List of elements that can be indexed to, and the elements can be repeated.
  2. The main difference between a Set and a List is that the elements in a Set cannot be repeated.

3.1 the List

In the Java documentation

An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.

Unlike sets, lists typically allow duplicate elements. More formally, lists typically allow pairs of elements e1 and e2 such that e1.equals(e2), and they typically allow multiple null elements if they allow null elements at all. It is not inconceivable that someone might wish to implement a list that prohibits duplicates, by throwing runtime exceptions when the user attempts to insert them, but we expect this usage to be rare.

.

The List interface provides a special iterator, called a ListIterator, that allows element insertion and replacement, and bidirectional access in addition to the normal operations that the Iterator interface provides. A method is provided to obtain a list iterator that starts at a specified position in the list.

It basically means

An ordered Collection (or sequence). Using this interface, you can accurately control the insertion of elements and get the element in its position according to index.

Unlike sets, lists allow the insertion of repeated elements. Some people want to implement a list of their own that disallows repeating elements and throws an exception when repeating elements are inserted, but we don’t recommend doing that.

.

Lists provide a special type of iterator iterator called ListIterator. This traverser allows traversal insertion, replacement, deletion, and bidirectional access. There is also an overloaded method that allows traversal from a specified location.

Then we look at the newly added interface of the List interface, and we can find that add and get have index parameters. It shows that on the basis of the original Collection, the List is an ordered container that can specify indexes. Notice the following addition of two new Iteractor methods here.

ListIterator<E> listIterator(a);

ListIterator<E> listIterator(int index);Copy the code

Let’s look at the ListIterator code

public interface ListIterator<E> extends Iterator<E> {
    // Query Operations

    boolean hasNext(a);

    E next(a);

    boolean hasPrevious(a);

    E previous(a);

    int previousIndex(a);

    void remove(a);

    void set(E e);

    void add(E e);
}Copy the code

Insertion and deletion of a collection during traversal can easily cause errors, especially for unordered queues, which cannot be performed during traversal. But the List is an ordered collection, so there’s a ListIteractor implemented that allows element manipulation during traversal and two-way access.

This is a good thing that has not been found in the previous development. mark

These are the basic concepts and rules of List. Here we introduce two common List implementation classes, ArrayList and LinkedList.

3.1.1 ArrayList

In terms of the interpretation of Java documentation, the following features are sorted out:

  1. ArrayList is a mutable array that implements the List interface
  2. You can insert null
  3. Its size, isEmpty, get, set, iterator,add methods are O(1), or O(n) if you add n data.
  4. ArrayList is not synchronized.

Then let’s take a quick look at the source code implementation of ArrayList. Here is only part of the source code analysis.

All elements are stored in an Object array, and size controls the length.

transient Object[] elementData;

private int size;Copy the code

Take a look at the code analysis of Add

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}Copy the code

CopyOf makes a copyOf the longer array and puts the previous data in it.

Let’s take a look at how the remove code is implemented.

public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}Copy the code

Arraycopy is used to move the index one step forward and remove the last index.

PS: I finally found that the data structure I learned before came into use. O. O

3.1.2 LinkedList

LinkedList is a sequence container maintained by a LinkedList. And ArrayList are sequence containers, one using arrays and one using linked lists.

Array and linked list:

  1. Find aspects. Arrays are more efficient and can be indexed directly for lookups, whereas linked lists must be looked up from scratch.
  2. Insert delete aspect. Especially in the middle of insertion and deletion, when linked lists are very convenient, you just need to break the chain where you insert or delete and then insert or remove elements, and then reassemble the chain, but you have to make a copy of the array and move all the data back and forward.
  3. On the memory side, when the array reaches its initial size, you need to reapply for a larger array and migrate the data to it. Linked lists only need to be created dynamically.

That’s the difference between LinkedList and ArrayList. Choose a List that is more appropriate for your usage scenario.

Here’s a quick snippet of the source code for LinkedList.

The first is the definition of the node of the linked list, a very simple two-way linked list.

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev; }}Copy the code

Each LinkedList then holds the head and tail Pointers to the list

transient int size = 0;

transient Node<E> first;

transient Node<E> last;Copy the code

List the most basic insert and delete operations

private void linkFirst(E e) {
    final Node<E> f = first;
    final Node<E> newNode = new Node<>(null, e, f);
    first = newNode;
    if (f == null)
        last = newNode;
    else
        f.prev = newNode;
    size++;
    modCount++;
}

void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

void linkBefore(E e, Node<E> succ) {
    // assert succ ! = null;
    final Node<E> pred = succ.prev;
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    if (pred == null)
        first = newNode;
    else
        pred.next = newNode;
    size++;
    modCount++;
}

private E unlinkFirst(Node<E> f) {
    // assert f == first && f ! = null;
    final E element = f.item;
    final Node<E> next = f.next;
    f.item = null;
    f.next = null; // help GC
    first = next;
    if (next == null)
        last = null;
    else
        next.prev = null;
    size--;
    modCount++;
    return element;
}

private E unlinkLast(Node<E> l) {
    // assert l == last && l ! = null;
    final E element = l.item;
    final Node<E> prev = l.prev;
    l.item = null;
    l.prev = null; // help GC
    last = prev;
    if (prev == null)
        first = null;
    else
        prev.next = null;
    size--;
    modCount++;
    return element;
}

E unlink(Node<E> x) {
    // assert x ! = null;
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }

    x.item = null;
    size--;
    modCount++;
    return element;
}Copy the code

The above 6 methods are the core of the list, head to tail insertion, head to tail deletion. Other external calls operate around these methods

LinkedList also implements the Deque interface, which extends from Queue. So LinkedList also supports queue pop, push, peek operations.

conclusion

The List implementation Usage scenarios The data structure
ArrayList Array accesses List chained collection data. The elements are repeatable and the access to the elements is fast An array of
LinkedList In the linked List mode, the elements can be repeated, and the insertion and deletion of elements is fast Two-way linked list

3.2 the Set

The core concept of a Set is that none of the elements in the Set are repeated. There are no additional methods specifically implemented in Collection in the Set subinterface, just a Set concept defined. Let’s look at some common implementations of Set: HashSet, LinkedHashSet, TreeSet

3.2.1 HashSet

The core concept of HashSet. Described in the Java documentation

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.

It basically means

HashSet implements the Set interface and is stored based on HashMap. Traversal does not guarantee order, and there is no guarantee that the next traversal will be in the same order as before. Null elements are allowed in a HashSet.

Going into the HashSet source code, we see that all the data is stored in

private transient HashMap<E,Object> map;

private static final Object PRESENT = new Object();Copy the code

That means that the set of hashsets is actually the set of keys of the HashMap, and the val of the HashMap is always PRESENT by default. A HashMap is defined as a collection of non-repeating keys. Use the HashMap implementation so that the HashSet does not need to be implemented again.

So all add and remove operations are actually add and remove operations on a HashMap. The traversal operation is essentially the traversal of the keySet of the HashMap, as shown in the following example

.public Iterator<E> iterator(a) {
    return map.keySet().iterator();
}

public boolean contains(Object o) {
    return map.containsKey(o);
}

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

public void clear(a) { map.clear(); }...Copy the code

3.2.2 LinkedHashSet

The core concept of a LinkedHashSet, as opposed to a HashSet, is a Set that preserves order. Hashsets are unordered, and linkedhashSets return a fixed collection order in traversal based on the order of add and remove operations. This order is not the size order of the elements, but ensures that the order of the two traversals is the same.

Similar to the source implementation of HashSet based on HashMap, the data structure of LinkedHashSet is based on LinkedHashMap. Too much will not say.

3.2.3 TreeSet

A TreeSet is an ordered collection that sorts naturally if no collation Comparator is specified. (Natural ordering i.e. e1.compareTo(e2) == 0 for comparison)

Note: Elements within a TreeSet must implement the Comparable interface.

TreeSet source algorithm is based on TreeMap, the specific algorithm is explained in the description of TreeMap.

conclusion

Set to implement Usage scenarios The data structure
HashSet An unordered, non-repetitive collection of data Based on a HashMap
LinkedSet Maintains a HashSet of order Based on the LinkedHashMap
TreeSet A collection that maintains the size order of elements that need to implement the Comparable interface Based on the TreeMap

4 HashMap, LinkedHashMap, TreeMap and WeakHashMap

4.1 a HashMap

A HashMap is the most basic and common type of Map. It is unordered and stored as a hash table. As mentioned earlier, a HashSet is based on a HashMap, using only the HashMap key as a single element.

Access methods of HashMap are the most basic three methods inherited from Map, as detailed above. Here I take a look at the implementation of the underlying data structure of HashMap.

Before looking at the source code for HashMap, let’s understand how it is stored – a hash table. Like we mentioned before we store it in arrays, we store it in linked lists. Hash tables are stored using a combination of arrays and linked lists. The following figure shows the storage method used in HashMap.

Hash table

Hash takes the value, puts it in an array, and hangs it down as a linked list if conflicts occur.

The storage definition of a HashMap is

transient Node<K,V>[] table;

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}Copy the code

Array table holds elements, and if there is a conflict, hangs it on the next list of conflicting elements.

Here we can look at the get core method and put core method source

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}Copy the code

As you can see from the above code, the subscript index is calculated based on the hash value and the array length. If the hash values are identical, if not, the next list goes down to find the same hash value.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}Copy the code

If the hash index has an element, add a Node to the table. If the hash index does not have an element, add a Node to the table. If you already have a Node, you iterate through the Node’s next and place the new element last.

A HashMap iterates over the first non-empty element of the array and then accesses all nodes under its next based on that element. Because the first element does not necessarily start at 0 in the array, the HashMap is unordered traversal.

4.2 LinkedHashMap

The difference between LinkedHashMap and HashMap is that the LinkedHashMap traverses in order, saving the insertion order (you can also set the most recently accessed elements first, namely LRU).

In fact, the storage of LinkedHashMap is still the same as HashMap, using the hash table method, but LinkedHashMap maintains a head and tail linked list.

transient LinkedHashMap.Entry<K,V> head;

transient LinkedHashMap.Entry<K,V> tail;Copy the code

When creating a new Node, place the new Node at the end of the table. Instead of looking at the first non-empty element from the array, as in a HashMap, we iterate directly from the table head. In this way, ordered traversal is satisfied.

4.3 TreeMap

TreeMap is not commonly used. TreeMap implements the SortMap interface and defines a sorting rule. In this way, when TreeMap is traversed, elements will be returned according to the specified sorting rule.

4.4 WeakHashMap

WeakHashMap. The feature of such a Map is that when there is no other reference to the key except its own reference to the key, the Map will automatically discard the value.

For example, two Map objects are declared, one is HashMap, the other is WeakHashMap, and two objects A and B are added to the two maps at the same time. When HashMap removes A and points to BOTH A and B to null, A in WeakHashMap will be automatically reclaimed. The reason for this situation is that for object A, when HashMap is removed and a is pointed to NULL, there is no pointer pointing to A except for A is saved in WeakHashMap, so WeakHashMap will automatically discard A, while for object B, although it points to NULL, But HashMap also has a pointer to B, so WeakHashMap will remain.

WeakHashMap is not used much, which is briefly mentioned here.

conclusion

Map implementation Usage scenarios The data structure
HashMap Hash table stores key-value pairs, keys do not repeat, unordered Hash table
LinkedHashMap Is a HashMap that records insertion order and access order It is stored as a hash table, but maintains a header pointer to keep track of order
TreeMap Has element sorting function Red and black tree
WeakHashMap Weak key mappings, where there are no references to keys outside the map, can be garbage collected Hash table

At the end

Above is the complete analysis and source code analysis of Java collection. Among them, ArrayList and HashMap are widely used. When considering efficiency, I remember Linded series set and WeakHashMap. Over~~

More articles follow my public account

My official account