Arrays are a common interview spot, so I tried to learn more about them. There are also many articles on array deduplication on the Internet, but I think the analysis is not deep enough. In fact, many of them are repeated implementations, which can be classified into one category, such as double loop method and indexOf method are double loop in nature, so I write this article to make further summary and deepen understanding.
1. Double loop
So this is pretty straightforward and easy to understand. That is, we create a new empty array, and every time we take an element from the old array, we compare it to the new array. If there is no equivalent element in the new array, it will be put into the new array; If you find an element equal to the one you fetched, do not put it into the new array. When all the arrays in the original array are removed, the new array is all the data in the deleted array.
The time complexity of this implementation of array de-duplication is O(n^2).
const unique = arr= > {
let res = [];
for (let i = 0, len = arr.length; i < len; i++) {
let j = 0, len2 = res.length;
for (; j < len2; j++) {
if (arr[i] === res[j]) break;
}
if (j == len2) res.push(arr[i]); // j == len2 indicates that break is not executed.
}
return res;
}
Copy the code
Of course, the first loop can be changed to forEach(), and the second for loop can be changed to include () or indexOf(). The time complexity is the same, but the code is cleaner.
const unique = arr= > {
let res = [];
arr.forEach(item= > {
if(! res.includes(item)) res.push(item); })return res;
}
Copy the code
2. Find the first occurrence of the element
The array is traversed backwards to see if the first occurrence of the element is the current element position. If not, there is duplication and remove the current element. If not, don’t remove it.
The reason we’re going backwards is because we’re moving elements (which is actually splice). Of course, you can choose to iterate backwards, but if the current element is removed during the iterate, then arr[I] is the same as the original arr[I +1], so I can’t increment and the condition is changed to I == len-1, which is very difficult.
This method does not require the creation of a new array, and the space complexity is O(1).
const unique = arr= > {
for (let i = arr.length - 1; i >= 0; i--) {
for (let j = 0; j < i; j++) {
if (arr[j] === arr[i]) arr.splice(i, 1); }}return arr;
}
Copy the code
The code implementation here is designed to minimize the time complexity. And as an aside, we can actually optimize it a little bit up here, because we’re not moving all the elements to the final location at once. The optimization idea is to mark the index of all the elements to be deleted, and then go through the array from front to back. Every time the MTH index is deleted, the following elements will overwrite the MTH element of the array, which is not implemented here, but just mentioned.
We could have made the code more readable (performance would have been slightly reduced since incluedes traverse the entire array) by using the filter() and includes() methods instead, and left the implementation alone.
3. Sort and de-duplicate
There are different kinds of sorting algorithms, so let’s just use the sorting algorithm that comes with JS. Incidentally, the V8 sort() method uses insertion sort if the array length is less than or equal to 10, and quicksort if the array length is greater than 10.
After finishing the guai order, we check the two adjacent elements. If the current element is not equal to the previous one, we put the current element into the new array.
const unique = arr => {
if (arr.length < 2) return arr;
arr.sort();
let r = [arr[0]];
for (let i = 1, len = arr.length; i < len; i++) {
if(a[i] ! == a[i - 1]) r.push(a[i]); }return r;
}
Copy the code
This deweighting is very limited. It does not apply to objects because objects are not suitable for sorting. The default sorting order of sort() is based on string Unicode code points. Sort () seems to convert objects to strings and then sort them. Generally, objects are converted to “[object object]”.
4. Use a hash table
Hash tables, in JavaScript, are implemented by objects. The advantage of hash tables is that the time required to read the data is generally O(1). Js objects can only have keys of string type, but consider using ES6’s new Map data structure, which allows any type of value to be used as a key.
The following implementation uses ordinary objects as hash tables and has a major limitation that it cannot deduplicate js objects (objects are converted to strings like [object object]). In addition, for js objects, a[‘1’] and a[1] are equal, because 1 will be converted to ‘1’, so I can’t distinguish between 1 and ‘1’, so I mistakenly discard one of the elements in the process of deduplication, so I made a simple improvement. Typeof (arr[I]) + arr[I] typeof(arr[I]) + arr[I]
const unique = arr => {
let r = [];
let map = {};
for (let i = 0, len = arr.length; i < len; i++) {
const item = arr[i];
if(! map[typeof(item) + item]) { r.push(arr[i]); } map[typeof(item) + item] =true;
}
return r;
}
Copy the code
In this way, the time complexity can reach O(n).
If you consider objects that can be deduplicated, consider using ES6 maps.
5. The ES6 Set
ES6 provides new data structures. The Set instance will assume that two Nans are equal (even though NaN! == NaN) and consider the two objects to be unequal (in this case, of course, “two objects” means two variables of reference type that refer to different memory Spaces).
Without knowing much about the Set source implementation, I won’t analyze the performance.
const unique = arr => {
return Array.from(new Set(arr))
}
Copy the code
It’s pretty neat, and IF your runtime supports ES6, or if you can compile to ES5, I recommend using this deduplication scheme.
Consider the de-weighting of NaN
If you want to consider de-reweighting NaN, you’ll need to change the code a little bit.
In simple terms, we determine if the item is a NaN, and then we check to see if there is an NaN in the returned array. If so, put it in an array; Otherwise do not put in.
const unique = arr= > {
let res = [];
let hasNaN = false;
arr.forEach(item= > {
if(! hasNaN &&Number.isNaN(item)) {
res.push(item);
hasNaN = true
}else if (!res.includes(item)) {
res.push(item);
}
})
return res;
}
Copy the code
How does Lodash achieve deweighting
Briefly talk about lodash’s uniq method source implementation.
The behavior of this method is the same as that of using Set to de-duplicate.
When the array is 200 or greater in length, a Set is created and converted to an array for deduplicating (the implementation of the absence of a Set is not analyzed). When the array length is less than 200, a deduplicating scheme similar to the double loop mentioned earlier is used, and NaN deduplicating is also done.
conclusion
In general, in development, the arrays to be deduplicated are not large enough to be a performance concern. So in the project, in order not to complicate the simple problem, it is recommended to use the most concise ES6 Set array scheme to achieve. Of course, specific problem specific analysis, according to the scene to choose a really appropriate to re-scheme.
In addition, there are many definitions of “equality”. There are four equality algorithms in ES6, so if you are interested, you can check out this article: Equality In JavaScript. Still, the appropriate equality algorithm is selected according to the scene.
reference
- Nuggets –7 ways to de-duplicate arrays
- Array deduplication in JavaScript topics
- Lodash’s UNIQ method implementation
- Equality judgment in JavaScript