In work, it is often necessary to take the intersection and difference sets between two sets, but ordinary retainAll() and removeAll() cannot meet the large amount of data, so we try to use other methods to solve the problem. Note: It is still convenient to use retainAll() and removeAll() if the data volume is small

1. If there is no duplicate data, if there is duplicate data, it will also be overwritten. In actual situation, duplicate data is meaningless.

2. When taking intersection or difference sets, there is a relationship between master data and slave data. You can judge in advance which set has more data to determine the master and slave data, or obtain two result sets after two rounds of comparison. In the following example, list1 is the master data and List2 is the slave data.

Take the intersection

public static void main(String[] args) {
        // Simulate data
        List<Integer> list1 = new ArrayList<>();
        List<Integer> list2 = new ArrayList<>();
        for (int i = 1; i <= 1000000; i++) {
            list1.add(i);
            list2.add(1000000 - i);
        }
        // Record the start time
        long startTime = System.currentTimeMillis();
        // Final result set
        List<Integer> resultList = new ArrayList<>();
        // Intermediate storage
        Map<String, Integer> map = new HashMap<>();

        list2.forEach(i2 -> {
            map.put(i2 + "", i2);
        });

        list1.forEach(i1 -> {
            Integer m = map.get(i1 + "");
            // If it is not null, then both list1 and list2 have the data
            if(m ! =null) { resultList.add(i1); }}); System.out.println("Time:" + (System.currentTimeMillis() - startTime) + "ms");
        System.out.println(resultList.size());
    }
Copy the code

The results

Take the difference set

 public static void main(String[] args) {
        // Simulate data
        List<Integer> list1 = new ArrayList<>();
        List<Integer> list2 = new ArrayList<>();
        for (int i = 1; i <= 1000000; i++) {
            list1.add(i);
            list2.add(1500000 - i);
        }
        // Record the start time
        long startTime = System.currentTimeMillis();
        // Final result set
        List<Integer> resultList = new ArrayList<>();
        // Intermediate storage
        Map<String, Integer> map = new HashMap<>();

        list2.forEach(i2 -> {
            map.put(i2 + "", i2);
        });

        list1.forEach(i1 -> {
            Integer m = map.get(i1 + "");
            // If it is null, the data is not present in list2
            if (m == null) { resultList.add(i1); }}); System.out.println("Time:" + (System.currentTimeMillis() - startTime) + "ms");
        System.out.println(resultList.size());
    }
Copy the code

The results

conclusion

Using retainAll() and removeAll() to retrieve intersection and difference sets takes a long time in the case of 1 million two-way data. However, using Map to store data and then comparing data through a loop can significantly improve the speed. This method can be used with any data structure. Just set the keys of the map in the middle.