In work, it is often necessary to take the intersection and difference sets between two sets, but ordinary retainAll() and removeAll() cannot meet the large amount of data, so we try to use other methods to solve the problem. Note: It is still convenient to use retainAll() and removeAll() if the data volume is small
1. If there is no duplicate data, if there is duplicate data, it will also be overwritten. In actual situation, duplicate data is meaningless.
2. When taking intersection or difference sets, there is a relationship between master data and slave data. You can judge in advance which set has more data to determine the master and slave data, or obtain two result sets after two rounds of comparison. In the following example, list1 is the master data and List2 is the slave data.
Take the intersection
public static void main(String[] args) {
// Simulate data
List<Integer> list1 = new ArrayList<>();
List<Integer> list2 = new ArrayList<>();
for (int i = 1; i <= 1000000; i++) {
list1.add(i);
list2.add(1000000 - i);
}
// Record the start time
long startTime = System.currentTimeMillis();
// Final result set
List<Integer> resultList = new ArrayList<>();
// Intermediate storage
Map<String, Integer> map = new HashMap<>();
list2.forEach(i2 -> {
map.put(i2 + "", i2);
});
list1.forEach(i1 -> {
Integer m = map.get(i1 + "");
// If it is not null, then both list1 and list2 have the data
if(m ! =null) { resultList.add(i1); }}); System.out.println("Time:" + (System.currentTimeMillis() - startTime) + "ms");
System.out.println(resultList.size());
}
Copy the code
The results
Take the difference set
public static void main(String[] args) {
// Simulate data
List<Integer> list1 = new ArrayList<>();
List<Integer> list2 = new ArrayList<>();
for (int i = 1; i <= 1000000; i++) {
list1.add(i);
list2.add(1500000 - i);
}
// Record the start time
long startTime = System.currentTimeMillis();
// Final result set
List<Integer> resultList = new ArrayList<>();
// Intermediate storage
Map<String, Integer> map = new HashMap<>();
list2.forEach(i2 -> {
map.put(i2 + "", i2);
});
list1.forEach(i1 -> {
Integer m = map.get(i1 + "");
// If it is null, the data is not present in list2
if (m == null) { resultList.add(i1); }}); System.out.println("Time:" + (System.currentTimeMillis() - startTime) + "ms");
System.out.println(resultList.size());
}
Copy the code
The results
conclusion
Using retainAll() and removeAll() to retrieve intersection and difference sets takes a long time in the case of 1 million two-way data. However, using Map to store data and then comparing data through a loop can significantly improve the speed. This method can be used with any data structure. Just set the keys of the map in the middle.