In daily business development, it is occasionally necessary to remove duplicate data from a List collection. At this point students may ask: Why not just use Set or LinkedHashSet? So there’s no duplication of data, right?

I have to say that the students who asked this question were very clever and saw the essence of the question at a glance.

However, the situation encountered in actual business development is more complicated. For example, the List collection may be a legacy issue, or it may be a type restriction returned by the call interface, which can only be received using the List, or it may be discovered in the middle of the code when merging multiple collections. In short, there are many reasons for the problem, which are not listed here.

Once this problem is discovered, if you can modify the code to replace the List type with the Set type, you can change the collection type directly. But if you can’t fix it at all, or if it’s too expensive to fix, these 6 ways to fix it will help.

Front knowledge

Before we begin, let’s understand two sets of concepts: unordered set and ordered set & unordered and ordered. Because these two sets of concepts will be repeated throughout the rest of the method implementation, it is important to clarify them before you begin in earnest.

Unordered collection

An unordered collection is one in which the order in which data is read is inconsistent with the order in which data is inserted. For example, the order in which collections are inserted is 1, 5, 3, 7, and the order in which collections are read is 1, 3, 5, 7.

An ordered set

The concept of ordered sets is the opposite of the concept of unordered sets, which means that the read order and insert order of a set are the same. For example, if the order of data insertion is 1, 5, 3, 7, the order of data read is also 1, 5, 3, 7.

Order and disorder

From the above unordered set and ordered set, we can get the concepts of order and disorder. Ordered means that the data is arranged and read in the order that we expect it to be. And disorder means that the order in which the data is arranged and read is not what we expect it to be.

PS: It doesn’t matter if the concepts of order and disorder are not clear, we can further understand their meanings through the following examples.

Method 1: Contains judgment to reorder (order)

Going for data, we first thought is to create a new collection, and then the original collection, judgment of the original set each cycle cycle, if the current cycle of data, did not exist in the new collection is inserted, already exist to abandon, such as cycle performed, we get a set without repeating elements, implementation code is as follows:

public class ListDistinctExample {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>() {{
            add(1);
            add(3);
            add(5);
            add(2);
            add(1);
            add(3);
            add(7);
            add(2);
        }};
        System.out.println("Original set :" + list);
        method(list);
    }

    /** * Customizes to delete *@param list
     */
    public static void method(List<Integer> list) {
        / / the new collection
        List<Integer> newList = new ArrayList<>(list.size());
        list.forEach(i -> {
            if(! newList.contains(i)) {// Insert if the new collection does not existnewList.add(i); }}); System.out.println("Deassemble :"+ newList); }}Copy the code

The execution result of the above program is as follows:The advantages of this method are as follows: it is relatively simple to understand, and the final set is ordered, which means that the order of the new set is consistent with the order of the original set; But the disadvantage is that the implementation code is a little too much, not concise and elegant.

Method 2: Iterator de-duplication (unordered)

Custom List to heavy, in addition to the above new collection, we can also use the iterator circulation judge each data, if the current loop, exist in the collection of two or more than two, will delete the current element, so after a cycle, can also get a no duplicate data collection, the implementation code is as follows:

public class ListDistinctExample {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>() {{
            add(1);
            add(3);
            add(5);
            add(2);
            add(1);
            add(3);
            add(7);
            add(2);
        }};
        System.out.println("Original set :" + list);
        method_1(list);
    }

    /** * use iterators to duplicate *@param list
     */
    public static void method_1(List<Integer> list) {
        Iterator<Integer> iterator = list.iterator();
        while (iterator.hasNext()) {
            // Get the value of the loop
            Integer item = iterator.next();
            // If there are two identical values
            if(list.indexOf(item) ! = list.lastIndexOf(item)) {// Remove the same last value
                iterator.remove();
            }
        }
        System.out.println("Deassemble :"+ list); }}Copy the code

The execution result of the above program is as follows:The implementation of this method is less code than the previous method, and there is no need to create a new collection, but the new collection obtained by this method is unordered, that is, the order of the new collection is not the same as the original collection, so it is not the optimal solution.

Method 3: HashSet de-duplication (unordered)

We know that a HashSet is inherently “de-duplicate”, so we just need to convert a List to a HashSet.

public class ListDistinctExample {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>() {{
            add(1);
            add(3);
            add(5);
            add(2);
            add(1);
            add(3);
            add(7);
            add(2);
        }};
        System.out.println("Original set :" + list);
        method_2(list);
    }

    /** * use HashSet to delete *@param list
     */
    public static void method_2(List<Integer> list) {
        HashSet<Integer> set = new HashSet<>(list);
        System.out.println("Deassemble :"+ set); }}Copy the code

The execution result of the above program is as follows:The implementation code of this method is relatively simple, but the disadvantage is that the HashSet is automatically sorted, so that the data sorting of the new collection is not the same as the original collection. If there is a requirement for the order of the collection, then this method cannot meet the current requirements.

Method 4: LinkedHashSet de-rehash (ordered)

LinkedHashSet = HashSet = HashSet = HashSet = HashSet = HashSet = HashSet

public class ListDistinctExample {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>() {{
            add(1);
            add(3);
            add(5);
            add(2);
            add(1);
            add(3);
            add(7);
            add(2);
        }};
        System.out.println("Original set :" + list);
        method_3(list);
    }

    /** * use LinkedHashSet to delete *@param list
     */
    public static void method_3(List<Integer> list) {
        LinkedHashSet<Integer> set = new LinkedHashSet<>(list);
        System.out.println("Deassemble :"+ set); }}Copy the code

The execution result of the above program is as follows:As can be seen from the above code and execution results, LinkedHashSet is a simple implementation method so far, and the final generated new collection is consistent with the order of the original collection. It is a deduplication method that we can consider to use.

Method 5: TreeSet de-weighting (unordered)

In addition to the Set Set above, we can also use the TreeSet Set to implement the de-duplication function, the implementation code is as follows:

public class ListDistinctExample {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>() {{
            add(1);
            add(3);
            add(5);
            add(2);
            add(1);
            add(3);
            add(7);
            add(2);
        }};
        System.out.println("Original set :" + list);
        method_4(list);
    }

    /** * Use TreeSet to remove weight (unordered) *@param list
     */
    public static void method_4(List<Integer> list) {
        TreeSet<Integer> set = new TreeSet<>(list);
        System.out.println("Deassemble :"+ set); }}Copy the code

The execution result of the above program is as follows:Unfortunately, TreeSet, while relatively simple to implement, had the same problems as HashSet, sorting automatically, and therefore didn’t meet our needs.

Method 6: Stream de-duplication (ordered)

JDK 8 brings with it a very useful method called Stream that can do a lot of things, such as the following de-redo function:

public class ListDistinctExample {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>() {{
            add(1);
            add(3);
            add(5);
            add(2);
            add(1);
            add(3);
            add(7);
            add(2);
        }};
        System.out.println("Original set :" + list);
        method_5(list);
    }

    /** * use Stream to delete *@param list
     */
    public static void method_5(List<Integer> list) {
        list = list.stream().distinct().collect(Collectors.toList());
        System.out.println("Deassemble :"+ list); }}Copy the code

The execution result of the above program is as follows:The difference between Stream implementation and other methods is that it does not need to create a new set, but can use itself to receive a result of the deduplication, and the implementation code is very simple, and the sequence of the collection after the deduplication is consistent with the order of the original set, which is the most priority of the deduplication method.

conclusion

In this paper, we introduce six methods of collection de-duplication. Among them, there are only two methods that can achieve the simplest implementation and keep the same order after de-duplication with the original collection: LinkedHashSet de-duplication and Stream de-duplication. The latter method is the preferred de-duplication method without the help of new collection.

Judge right and wrong from yourself, praise to listen to others, gain and loss in the number.

Blogger introduction: programmer after 80 years, writing blog this matter “insist” for 11 years, hobbies: reading, jogging, badminton.

My public number: Java interview analysis

Personal wechat: GG_Stone, welcome to circle of friends, do a “like” just pay.